In this report, a 2D resource evaluation of a Co-Ni deposit is conducted. In this process, different kriging and gaussian cosimulation methods are compared with regards to their prediction error. Finally, one method is chosen and applied conduct a Grade-Tonnage estimation of the deposit. The assigned file contains one layer of data from the nickel laterite deposit Murrin Murrin in western Australia. The deposit consists of four lithologies :
The metallogenetic model considers Ni (and to a lesser degree Co) is washed from the FZ and concentrated in the SA and SM zones, while Mg is utterly removed from the system. So, there are systematic changes in the composition of the ore depending on the zone.
This work consists on analyzing this data set with a general exploratory data analysis, a spatial exploratory analysis, variography, indicator and cokriging as well as cosimulation. Further, a crossvalidation is done to evaluate the resource distribution in the assigned slice of the deposit. This allows to obtain an overview of the effect of choosing different interpolation methods. In total six methods are compared. Three kriging methods and three gaussian simulation methods. The methods are as follows:
| Kriging | Simulation | ||
|---|---|---|---|
| VK1 | Cokriging Log-transformed values Omnidirectional |
VS1 | Cosimulation Log-transformed values Omnidirectional |
| VK2 | Mixed-model cokriging Log-transformed values Incl. lithology Anisotropy |
VS2 | Mixed model cosimulation Log-transformed values Incl. lithology Anisotropy |
| VK3 | Cokriging Logratio-compositions No lithology Omnidirectional |
VS3 | Cosimulation Logratio-compositions No lithology Omnidirectional |
| IK | Indicator kriging for lithology Anisotropy |
As such, the experimental plan follows somewhat the scheme of comparing
While this testing plan does not follow a Design of Experiments approach directly, it could somewhat be viewed as a two-dimensional testing plan with two levels for a and three for b.
The six main variants then are compared to each other and a final model is chosen and applied for a preliminary quantitative resource estimation. The selection of the final model is based on the mean absolute error (MAE), mean relative error (MRE), mean deviation (MD), and crossvalidation scatterplots.
For the final model tonnage-grade curves are computed to estimate the resource potential of the area. Since the final model uses simulation methods. A manifold of grade-tonnage curves is produced to estimate the variation and uncertainty of the resource estimation.
The dataset is a partial of a study published as: [1] - Talebi, H., Mueller, U., Tolosana-Delgado, R. et al. Geostatistical Simulation of Geochemical Compositions in the Presence of Multiple Geological Units: Application to Mineral Resource Evaluation. Math Geosci 51, 129–153 (2019). https://doi.org/10.1007/s11004-018-9763-9
Notice: The nature of this work is largely driven by graphical assessments and graphing. To maintain a certain brevity, some subsections of this work are structured in parallel. This means, that the user can activate the subsection she or he wishes to see. Code sections used to compile the calculations can be expanded by clicking on Code.
For the assembly of this work, the following packages are used:
#install.packages("funModeling")
#install.packages(gridExtra)
#install.packages(funModeling)
#install.packages("corrplot")
#install.packages("kableExtra")
#install.packages("vcd")
#install.packages("cvms") # for confusion metrics and matrices
#install.packages("plotly") # for interactive plots, is not loaded into namespace since it overwrites some dplyr functions
#install.packages("ggnewscale")
#install.packages("rasterly")
library("rasterly") # fast plotting of larger spatial datasets with plotly
library("ggnewscale") # allows combination of cont. and factorial data in one ggmap
library("sp") # contains spatial data classses
library("gstat") # classical geostatistics
library("RColorBrewer") # color palettes
library("magrittr") # pipe %>%
library("dplyr") # data manipulation verbs
library("gmGeostats") # gstat re-interfaced / for swath plots
library("ggplot2") # nicer plots
library("data.table") # for fast data frame manipulation
library("knitr") # for nicer report formatting
library("kableExtra") # for table formatting
library("corrplot") # for corralation visualisation
library("funModeling") # helper functions for exploratory data analysis
library("compositions") # compositional modelling
library("gridExtra") # for arranging multiple ggplots in a grid similar to par(mfrow)
library("vcd") # slightly improved optics for mosaic plot
library("readr") # more comfortable read csv fun
The provided dataset consists of 12 columns and 735 observations. By standard, the columns X1-Ni are imported as numerical. “Hard”, “Lcode” and “subset” can be considered factorial variables and therefore are directly converted accordingly for further use.
#import
dt <- read_csv("Grafe.csv") %>% as.data.table()
#convert colums to factor
fkt <- c("Hard", "Lcode", "subset" )
dt[,(fkt) := lapply(.SD, factor), .SDcols = fkt]
#print data table
dt %>% head() %>% kable(table.attr = "style = \"color: black;\"") %>% kable_styling(bootstrap_options = c("striped", "hover", "condensed"))
| X1 | X | Y | Al | Co | Fe | Filler | Mg | Ni | Hard | Lcode | subset |
|---|---|---|---|---|---|---|---|---|---|---|---|
| 23 | 1325 | 225 | 4.70 | 0.001 | 41.50 | 53.685 | 0.09 | 0.024 | 3 | FZ | training |
| 32 | 1325 | 325 | 1.43 | 0.011 | 11.93 | 78.699 | 7.70 | 0.230 | 0 | FZ | training |
| 52 | 1300 | 250 | 3.60 | 0.004 | 19.70 | 76.259 | 0.39 | 0.047 | 3 | FZ | training |
| 76 | 1275 | 175 | 5.10 | 0.031 | 30.90 | 62.909 | 0.64 | 0.420 | 3 | FZ | training |
| 100 | 1275 | 225 | 1.60 | 0.012 | 3.50 | 81.479 | 13.10 | 0.309 | 2 | FZ | training |
| 123 | 1275 | 275 | 2.80 | 0.029 | 17.20 | 71.044 | 8.24 | 0.687 | 1 | FZ | training |
Generally, four groups of data are present in the dataset:
The the summary statistics for the numeric variables are shown in the following table.
options(knitr.kable.NA = '', digits=2, scipen=999)
sumstats <- profiling_num(dt) %>% #convenient function to calculate most standard summary stats
select(mean, std_dev, variation_coef, p_25, p_50, p_75, skewness, kurtosis) %>% #select wanted stats
cbind (.,data.frame(min = sapply(dt[,.SD, .SDcols = is.numeric], function(x) min(x, #add min / max
na.rm = T)),
max = sapply(dt[,.SD, .SDcols = is.numeric], function(x) max(x,
na.rm = T)) ))
names(sumstats) <- c("Mean", "Std. Dev.","Var. Coef.", "Q1", "Q2/Median", "Q3", "Skewness", "Kurtosis", "min", "max")
#improve look and print
sumstats %>%
kable( table.attr = "style = \"color: black;\"") %>% kable_styling(bootstrap_options = c("striped", "hover", "condensed"))
| Mean | Std. Dev. | Var. Coef. | Q1 | Q2/Median | Q3 | Skewness | Kurtosis | min | max | |
|---|---|---|---|---|---|---|---|---|---|---|
| X1 | 9292.05 | 5313.11 | 0.57 | 4303.50 | 8693.00 | 14329.50 | -0.31 | 1.4 | 23.00 | 14513.0 |
| X | 589.97 | 346.76 | 0.59 | 300.00 | 550.00 | 850.00 | 0.32 | 2.0 | 0.00 | 1325.0 |
| Y | 355.07 | 174.26 | 0.49 | 225.00 | 325.00 | 475.00 | 0.54 | 2.5 | 25.00 | 850.0 |
| Al | 4.84 | 3.57 | 0.74 | 1.50 | 4.20 | 7.60 | 0.55 | 2.4 | 0.10 | 17.5 |
| Co | 0.07 | 0.12 | 1.85 | 0.02 | 0.03 | 0.06 | 5.35 | 43.4 | 0.00 | 1.4 |
| Fe | 23.21 | 13.04 | 0.56 | 10.60 | 22.30 | 34.80 | 0.20 | 1.7 | 1.00 | 54.9 |
| Filler | 65.78 | 10.63 | 0.16 | 56.34 | 67.64 | 74.01 | -0.10 | 2.1 | 40.33 | 93.8 |
| Mg | 5.40 | 6.31 | 1.17 | 0.24 | 1.41 | 11.14 | 0.85 | 2.3 | 0.06 | 24.1 |
| Ni | 0.70 | 0.51 | 0.72 | 0.33 | 0.58 | 0.97 | 1.24 | 4.7 | 0.01 | 3.0 |
It stands out, that Co exhibits both a very high skewness as well as kurtosis. This points towards a highly skewed distribution with possible outliers. Fe and Filler exhibit a relatively low skewness. Generally, the distributions of the variables will be investigated further. X1, the drillhole ID, will be dropped from now on.
Regarding the units of the variables in the dataset, it can be assumed that the values of the chemical elements are in weight-percent and the spatial coordinates can assumed to be meters. The exact coordinate system is not known, but it appears to be a local kartesian coordinate system. It is not known, whether X denotes northing or easting. For the sake of simplification, it is chosen to be the same as the traditional X-axis in plots.
Furthermore, the chemical variables and the filler sum up to 100 %. This allows the chemical components to be treated as a set of compositions.
While histograms are a common and familiar technique to evaluate distributions of data, they are harder to read when multiple groups are expressed in one plot. Because of this, density plots are chosen for visualization in this work. This allows for a grouping of the data according to their lithology.
#prepare dt
colnames.num <- names(dt[,.SD, .SDcols = (is.numeric )][,-c("X1", "X", "Y")])
dt.m <- melt.data.table(dt[,.SD, .SDcols = c(colnames.num, "Lcode")][,-c("X1", "X", "Y")],
id.vars = c("Lcode"))
#draw boxplots
a <- ggplot(data = dt.m, aes( x=value, fill=Lcode)) +
geom_density( alpha = 0.3, aes(y = ..density..))+
facet_grid(scales = "free", rows = vars(variable))
a
It shows, that for all variables except Fe and Filler, logarithmic scales seem to be preferential. This goes in line with the findings from the summary statistics. Hence, in the following, the logarithmized results are shown:
#prepare dt
colnames.num <- names(dt[,.SD, .SDcols = (is.numeric )][,-c("X1", "X", "Y")])
dt.m <- melt.data.table(dt[,.SD, .SDcols = c(colnames.num, "Lcode")][,-c("X1", "X", "Y")],
id.vars = c("Lcode"))
#draw plots
a <- ggplot(data = dt.m, aes( x=log(value), fill=Lcode)) +
geom_density( alpha = 0.3, aes(y = ..density..))+
facet_grid(scales = "free", shrink=F, rows = vars(variable))
a
Both from the general histograms as well as from the log-histograms, it can be seen that the distributions show differences in between the different lithologies. For almost all elements, differences in the empirical distribution density functions can be observed. both the spread of the distribution as well as the measures of centrality differ. The observed differences discussed further in the subsequent sections.
Also it can be seen, that a logarithmic scale seems to provide distributions that less are skewed and more close to a normal distribution which is a perquisite for the kriging analysis.
To confirm this, the QQ-plots are drawn on the normal data:
{par(mfcol=c(2,3))
qqnorm((dt$Al), col=2);qqline((dt$Al));legend(legend = "Al","topleft")
qqnorm((dt$Co), col=2);qqline((dt$Co));legend(legend = "Co","topleft")
qqnorm((dt$Fe), col=2);qqline((dt$Fe));legend(legend = "Fe","topleft")
qqnorm((dt$Filler), col=2);qqline((dt$Filler));legend(legend = "Filler","topleft")
qqnorm((dt$Mg), col=2);qqline((dt$Mg));legend(legend = "Mg","topleft")
qqnorm((dt$Ni), col=2);qqline((dt$Ni));legend(legend = "Ni","topleft")}
Large deviations can be seen for most distributions. Especially the tail behavior differs from the normal distributions. So the data again are log-transformed:
dt.orig <- dt #backup for debugging
fkt <- c("Al", "Co","Fe", "Filler", "Mg", "Ni") #colums to transform
fkt_new <- c("Al_log", "Co_log","Fe_log", "Filler_log", "Mg_log", "Ni_log") #names of the transformed colums
dt <- dt[,(fkt_new) := lapply(.SD, log), .SDcols = fkt] #actual transformation with data.table
Now the QQ-plots for the log-transformed data are as follows:
{par(mfcol=c(2,3))
qqnorm((dt$Al_log), col=2);qqline((dt$Al_log));legend(legend = "Al_log","topleft")
qqnorm((dt$Co_log), col=2);qqline((dt$Co_log));legend(legend = "Co_log","topleft")
qqnorm((dt$Fe_log), col=2);qqline((dt$Fe_log));legend(legend = "Fe_log","topleft")
qqnorm((dt$Filler_log), col=2);qqline((dt$Filler_log));legend(legend = "Filler_log","topleft")
qqnorm((dt$Mg_log), col=2);qqline((dt$Mg_log));legend(legend = "Mg_log","topleft")
qqnorm((dt$Ni_log), col=2);qqline((dt$Ni_log));legend(legend = "Ni_log","topleft")}
It can be seen that for:
As a result, all values, even Fe and Filler will be used as a logarithmized version. By this, all variables can be compared witch each other.
The summary for the categorical variables is shown below. Of relevance to the understanding of this dataset are the two variables Hard and Lcode. Additionally, a variable subset exists. It stores the information which part of the data set is used for training and which is used for validation of a modeling process. It is almost distributed 50-50 across the dataset. The table below shows the number of members in the respective groups.
dt[,.SD, .SDcols = is.factor] %>% summary %>% kable(table.attr = "style = \"color: black;\"") %>% kable_styling(bootstrap_options = c("striped", "hover", "condensed"))
| Hard | Lcode | subset | |
|---|---|---|---|
| 0: 53 | FZ:334 | training :371 | |
| 1:394 | SA: 65 | validation:364 | |
| 2:166 | SM:278 | ||
| 3:122 | UM: 58 |
The interrelationship of hardness and lithology is plotted with a mosaic plot. Hereby all possible combinations of levels for Lcode and Hard are plotted against each other. The size of the cells in each of the two axis’ corresponds to the number of members for that level. Additionally, the pearson residuals are plotted as blue and red shades. A positive pearson residual corresponds to more members being in this level combination than the null hypothesis would suggest. The null hypothesis is that no particular association exists between the groups. More information can be found in the supplement to the lecture [3], or in [4]. Additional to the mosaic plot the numeric values of the group combinations are shown in the contingency table.
#draw mosaic plot
vcd::mosaic(~Hard+Lcode, data = dt, shade=TRUE, legend=TRUE,direction="v",)
#calculate contingency table
ct <- dt[,.SD, .SDcols = c("Hard", "Lcode")] %>%
table() %>% as.data.frame.array()
#add colsums
csum <- ct %>% apply(.,2,sum)
ct["Sum",] <- csum
#add rowsums
rsum <- ct %>% apply(.,1,sum)
ct$Sum <- c(rsum)
#print
ct %>% kable(table.attr = "style = \"color: black;\"") %>% kable_styling(bootstrap_options = c("striped", "hover", "condensed"))
| FZ | SA | SM | UM | Sum | |
|---|---|---|---|---|---|
| 0 | 19 | 10 | 21 | 3 | 53 |
| 1 | 169 | 43 | 181 | 1 | 394 |
| 2 | 81 | 6 | 63 | 16 | 166 |
| 3 | 65 | 6 | 13 | 38 | 122 |
| Sum | 334 | 65 | 278 | 58 | 735 |
It shows that rock hardness 3 is positively associated to UM (fresh ultramafic zone) and negatively to SM (smectitic zone). Other than that, lower associations seem to exist beween SA (saprolitic zone) and Rock 0 as well as between SM and Rock 1. Especially the association of hardness level three is logical because fresh unweathered rock is generally harder/more robust than its weathered counterparts. In this dataset, the hardness level could also identify or incorporate indicators for the Rock Mass Quality as details of the nature of the variable Hard are not known.
To identify the association of the elements to the lithologies, boxplots after Tukey are produced:
#for grouped stats with ggplot, melted data tables are an convenient option
dt.m <- melt.data.table(dt[,.SD, .SDcols = c(colnames.num, "Lcode")],
id.vars = c("Lcode"))
ggplot(data = dt.m, aes( y=log(value),group=Lcode, fill=Lcode)) +
geom_boxplot( alpha = 0.3, notch = T)+
facet_wrap(scales = "free_y", shrink=F, ~variable, ncol=3)+
theme(axis.title.x=element_blank(),
axis.text.x=element_blank(),
axis.ticks.x=element_blank())
It clearly can be seen that differences of the element components between the rock masses appear. For Al and Fe, the highest values are found in Zone FZ, followed by SA, SM and UM in decreasing order. Mg and Filler however behave oppositely, with the lowest values found in FZ and SA, SM and UM showing values in increasing order. Co and Ni show a different behavior. Here the highest values are found in SA, followed by SM. The association of SA to Ni and Co is already known from the geological description of the site and can be confirmed here. FZ and UM show similar low values of the two elements. The notches give a graphical indication whether the differences in the medians are significant. When the notches do not overlap, there is strong evidence, that the medians differ. According to that, the differences between:
do not differ significantly.
The results however show, that generally a relative strong influence of the lithology towards the mineralization can be assumed.
To identify whether similar findings can also be seen for the rock hardness, the procedure is repeated for the rock hardness.
#for grouped stats with ggplot, melted data tables are an convenient option
dt.m <- melt.data.table(dt[,.SD, .SDcols = c(colnames.num, "Hard")],
id.vars = c("Hard"))
ggplot(data = dt.m, aes( y=log(value), fill=Hard)) +
geom_boxplot( alpha = 0.3, notch = T)+
facet_wrap(scales = "free", shrink=F, ~variable, ncol=3)
dt.m <- melt.data.table(dt[Lcode=="SA",.SD, .SDcols = c(colnames.num, "Hard")], # Example for detailed view for Lcode="SA"
id.vars = c("Hard"))
ggplot(data = dt.m, aes( y=log(value), fill=Hard)) +
geom_boxplot( alpha = 0.3, notch = F)+
facet_wrap(scales = "free", shrink=F, ~variable, ncol=3)+
labs(title = "Composition parts vs. Hardness for Lcode SA")
While some differences can be seen, the general picture is less distinct than for the rock type. UM is tendentiously the hardest zone and also contains tendentiously more Mg. This does not show in the hardness.
For Co, the picture on first sight is very similar to the results from the boxplots grouped by Lcode. Zone SA (which is relatively higher associated to hardness 0), showed the highest Co content. Here however, hardness 1 shows the highest Co contents. So a simple relationship can not be assumed. Also crossinterellations between Lithology and hardness could be assumed.
For example within Zone SA, a even stronger accumulation of Co could be associated to a certain rock hardness. This is exemplarily shown in the second boxplot. There the notches are removed because the results are partially so skewed that the notches would extend over the Q1 resp. Q3. Generally a similar picture for the elements Co, Mg and Ni can be observed: That a stronger accumulation of these elements occurs in hardness 1. The composition of a rock mass can influence the hardness even in a small scale. Some indicators for this during mechanical excavation have been found by Rimos et. al [3] during a master thesis at TU BAF. There, the cutting resistance for the cutting with conical picks of pyritic Pb-Zn-Ore samples from Reiche Zeche was tested. It showed a higher cutting resistance in areas in which increased Ca-contents where found. The element contents where approximated in a dense pattern by means of a handheld XRFA device.
However, due to the necessary complexity in modelbuilding, this influence is dropped as it would overextend the framework of this work.
# this is completed if necessary
# options(scipen = 5)
# i <- 1
# for (i in 1:length(levels(dt$Lcode))) {}
# contrasts(dt$Lcode) <- contr.treatment(levels(dt$Lcode), base=i)
# u <- levels(dt$Lcode)[[i]]
# lm1 <- lm(dt,formula = Mg~Lcode)
# lm1 <- lm1%>%summary
#
# names(lm1[["coefficients"]][,1])
# lm1[["coefficients"]][,1]
# lm1[["coefficients"]][,4]
The correlation and covariance statistics for the numeric values are presented as follows. The correlation is calculated twofold: as spearman correlation and as pearson correlation. The pearson correlation coefficient the goodness of a fit of a linear regression between these two variables. A pearson coefficient of 1 equals a R² of 1 for a linear regression. A spearman correlation coefficient follows the same basic principles but takes into consideration the ranks of the data pairs instead of their actual values. As such, a spearman correlation of 1 means that the relation between the covariates follows a perfect monotonous ascending function (-1 would indicate a descending function). As such, a high spearman and a low pearson coefficient could indicate e.g. a quadratic relation. A more detailed description of the two methods can be found e.g. in [2].
Values in the upper triangle show the respective spearman-correlation coefficients, values in the lower part show the pearson correlation coefficients. Correlations that do not satisfy a threshold of sigma < 0.05 are marked with a black cross. These correlations can be considered insignificant.
All calculations are done with the logarithmized values as these will be taken for the further interpolation - except the spatial variables X and Y. For comparison, also the covariance matrix is computed.
use <- cbind(dt[,2:3], dt[,4:9] %>% log())
dt.ptest.sp <- cor.mtest(use,method="spearman")
dt.cor.sp <- cor(use,method = "spearman")
dt.ptest.p <- cor.mtest(use,method="pearson")
dt.cor.p <- cor(use,method = "pearson")
corrplot(dt.cor.sp,p.mat = dt.ptest.sp$p,
method = "color", outline=T,
type="upper",
addgrid.col = "grey", tl.col = "black",
tl.pos= "td",
addCoef.col="black",
sig.level = 0.05,
order="original", hclust.method = "complete", addrect = 6, rect.col = "black",
mar =c(1,0,2,1))
corrplot(dt.cor.p,p.mat = dt.ptest.p$p,
method = "color", outline=T,
type="lower",
addgrid.col = "grey", tl.col = "black",
tl.pos= "lt",
addCoef.col="black",
cl.pos="n",
sig.level = 0.05,
order="original", hclust.method = "complete", addrect = 6, rect.col = "black",
mar =c(1,0,1,1), add=T)
mtext(side = 3,line = 2, "Top: Spearman")
mtext(side = 2, "Bottom: Pearson")
Generally, it can be seen, that the correlations after pearson and spearman show similar results. Spearman correlation in some cases gives higher values than pearson. However, also the opposite can be observed here. When the pearson correlation is higher than spearman, it speaks for deviations from linearity that are so little, that they reduce the spearman correlation more than pearson (since spearman operates rank-based). When the spearman correlation is higher, a monotonous but less linear function is present.
The variable X and Y show very low correlations to the geochemical variables, some of them are also insignificant. This speaks for the absence of a general spatial trend. This will be investigated later in more detail.
Some correlation pairs show a higher correlation after spearman. This applies mainly to Mg/Ni and Al/Ni, Filler/Fe. Others like Mg/Filler and Fe/Al show a higher correlation for Pearson.
Generally within the group of chemical variables, the pairs show the following noteworthy correlations:
Noteworthy is that Al correlates positively only with Fe and negatively with filler speaks for the fact that it occurs as a companion with the iron and not as a own silicate mineralization. This goes in line with the geological description after [1] witch states a lateritic zone. Co and Ni show a positive relation, at the same time Ni shows a negative relation to Al. This speaks for the fact that these two form their own mineralization complex.
Following, the covariance matrix only for the chemical elements (log-transformed) is shown.
use <- dt[,4:9] %>% log()
dt.cov <- cov(use,method = "pearson")
corrplot(dt.cov,
method = "color", outline=T,
type="upper",
addgrid.col = "grey", tl.col = "black",
tl.pos= "td",
addCoef.col="black",
sig.level = 0.05,
order="original", hclust.method = "complete", addrect = 6, rect.col = "black",
mar =c(1,0,2,1),is.corr = F,
number.digits=2, number.cex = 0.8,
cl.align.text = "l")
The table shows similar results to the correlation analysis. However, it also shows the magnitude of variation, since it is a non-standardized version of the correlation. As such, Fe and Filler generally vary not so much, although they show the highest values. The highest absolute covariation values show for the pairs Al-Mg and Fe-Mg. Since the values are negative, the correlation is also negative. Also noteworthy is that the Pair Fe/Filler - which shows a strong correlation - but shows a weak covariance. This can be attributed to the low variance of Filler.
Swath plots are used to estimate whether a trend in the spatial directions might exist. There, the element contents are plotted against the X- resp Y-axis and overlaid with a moving average estimation. In this case, the estimator uses a Loess-estimator. The swath plots are shown in the following for the elements irrespective of their rock class.
A Scatter plot matrix can shed more light on the cross relations between the covariates. It is a tool that can give a broad understanding of the situation and allows a more detailed picture than a sheer correlation analysis. Here, the data are grouped according to the rock type to identify whether the correlation behavior is different in between the rock types. In the upper part, linear regression lines are drawn. On the diagonal, the density plots already shown previously are drawn. In the lower section raw scatterplots are drawn.
X <- dt[,c("X", "Y")]
swath(data = dt[,..fkt_new], loc=X$X)
For Mg_log a outlying area varying area can be seen for low X-values. Also slight trends could be observed for Fe, Filler and Ni to an extend. It also can be seen that some elements, in particular, Mg, seem to appear in at least two clusters. One that shows higher values and one that shows lower values.
X <- dt[,c("X", "Y")]
swath(data = dt[,..fkt_new], loc=X$Y,xlab = "Y", )
In Y-direction stronger local trends appear at low Y-Values. Also here, the appearance of clusters can be suspected.
library(GGally)
bigplot <- ggpairs(data = dt,columns = c(2,3,13:18),
mapping = ggplot2::aes(colour=dt$Lcode),
lower = list(continuous = wrap("points", alpha = 0.8, size=0.3)),
diag = list(discrete="barDiag",
continuous = wrap("densityDiag", alpha=0.5 )),
upper = list(continuous = wrap("smooth_loess", alpha = 0.1, size=0.05)
)
) +
theme(panel.grid.major = element_blank())
(bigplot)
The scatter plot matrix shows a more detailed picture of the situation. Also, since the variables are grouped by Lcode, a more detailed picture of its influence can be drawn.
The following points can be summarized from the scatter plot matrix:
The clear correlation between Fe and Filler is repeated
The scatter plots indicate clustered structures according to the rock type. There the behavior can be relatively different for the different rock types. Comparing the regression lines to the correlation coefficients and the scatter plots, it shows that for most pairs, the regression coefficients are relatively weak and the data appear more in clusters then in trends.
The correlation between the spatial variables X and Y and the chemical variables shows a curious effect. While the swath plots showed certain trends for some variables. The grouped scatterplots shed more light on these effects. There, the dominating effect to define the element contents seems to be the Lcode. Because the Lcode is distributed heterogeneously over the study area, an apparent trend can be seen. As such, instead of incorporating a static trend, a correlation of the rock type to the element contents should will be considered. Within the LCodes, only partial local trends can be seen.
These findings somewhat draw a unclear picture. While the swath plots indicate a certain trending behavior in Y direction, a more diverse a noisy behavior is shown with the grouped swath plots. The presence a trend of largely defines the extrapolation behavior. Since extrapolation is not an aim of this study and the sampling density is rather high, no trend will be used for the subsequent analysis. The correlation of Lcode to the element contents is incorporated in the analysis variants VK2 and VS2.
In order to shed more light on possible cluster behavior, a k-means cluster analysis could provide more insight into the situation. It is not included in this report to maintain brevity.
Following, the input variables with respect to their spatial distribution are described. Here, the categorical and continuous variables are first shown separately and then combined in the final subsection. Furthermore, the framing parameters for the semivariographic analysis that is conducted prior to the actual geospatial modeling are estimated.
The following maps plot the variables Hard and Lcode as dotplots. The X- and Y-axis represent the respective variable in the input data. Only the training data are plotted.
dt.m <- melt.data.table(dt[subset=="training",.SD, .SDcols = c("Lcode", "Hard", "X", "Y")][,-c("X1" )],
id.vars = c("X", "Y"))
temp <- list()
for (i in unique(dt.m$variable)) {
temp[[i]] <- ggplot(dt.m[variable==(i)], aes(x = X, y = Y, color = value)) +
geom_point(shape=19)+
coord_fixed()+
ggtitle(label = i) #+
#scale_color_gradient(low = "blue2",high = "red2")
}
do.call("grid.arrange", c(temp, ncol=2))
# par(mfrow=c(1,2))
#
# plot(Y~X, data=dt, pch=15, col=1+as.integer(Hard), asp=1, cex=0.5)
# legend("topleft", legend=levels(dt$Hard), fill=2:6)
#
# plot(Y~X, data=dt, pch=15, col=1+as.integer(Lcode), asp=1, cex=0.5)
# legend("topleft", legend=levels(dt$Lcode), fill=2:6)
It can be seen, that Hard is relatively evenly distributed. However, Hard 2 and 3 seem to accumulate towards the east of the field. For Lcode, an accumulation of FZ can be seen in a central E-W-band. SM is accumulated more on the west respective N and S borders of the project. Of importance are also the variables UM and and SA. They are relatively sparsely distributed. UM can be found in thin bands along the ESE-WNW axis. SA can be found in single patches in areas where FZ and SM seem to show contact zones.
To obtain an understanding of the distribution of the different elements. The values are plotted as follows:
dt.m <- melt.data.table(dt[,.SD, .SDcols = c(fkt_new, "X", "Y")][,-c("X1" )],
id.vars = c("X", "Y"))
dt.m
## X Y variable value
## 1: 1325 225 Al_log 1.55
## 2: 1325 325 Al_log 0.36
## 3: 1300 250 Al_log 1.28
## 4: 1275 175 Al_log 1.63
## 5: 1275 225 Al_log 0.47
## ---
## 4406: 50 675 Ni_log -2.73
## 4407: 50 825 Ni_log -0.93
## 4408: 25 150 Ni_log -0.39
## 4409: 25 700 Ni_log -1.71
## 4410: 0 125 Ni_log -0.55
temp <- list()
for (i in unique(dt.m$variable)) {
temp[[i]] <- ggplot(dt.m[variable==(i)], aes(x = X, y = Y, color = value)) +
geom_point()+
coord_fixed()+
ggtitle(label = i)+
scale_color_gradientn(colors = c("blue2","purple", "red2"))
}
do.call("grid.arrange", c(temp, ncol=3))
It can be seen from the maps that areas that show high concentrations of AL tend to show low concentrations of Co, Mg and Ni to some extend. Also the Filler variable exhibits this behavior. Generally, it seems that two different groups of mineralization exist. One group with high Al and Fe concentrations and one zone with high Mg, Ni, Co, Filler concentrations. This also goes in line with earlier discoveries, where the Lcode corresponds with certain element levels.
For a better visualization of the element distribution in relation to the Lcode, a bubble plot is good tool. Here, the assay locations are drawn as bubbles. Their color indicates the rock type and their size indicates the element content. Exemplary, the plots for Mg and Ni are shown.
a <- ggplot(data=dt, aes(x = X, y=Y, size=Mg_log, color=Lcode))+
geom_point(linecolor="black", alpha=0.7)+
coord_fixed()+
ggtitle(label = "Mg_log")
b <- ggplot(data=dt, aes(x = X, y=Y, size=Ni_log, color=Lcode))+
geom_point(linecolor="black", alpha=0.7)+
coord_fixed()+
ggtitle(label = "Ni_log")
grid.arrange(a, b, nrow=1)
For Mg it can be seen that the ferruginous zone shows generally shows lower iron contents, where higher values can be observed in the other three zones. For Ni, the picture is less clear, however, also here lesser contents seem to appear in zone FZ.
For the subsequent geostatistical modeling procedures, the bin width and max variogram distance is estimated as follows. For this estimation, only the assay points for the “training” part of the data set are used.
The minimum distance between sample sites is ca. 35 m. This value will be used as bin size. The maximum can be estimated with the median distance between assay sites or graphical with the radius of the biggest circle that fits in the boundaries of the survey area. The median distance is 431.57 m. The radius of the biggest circle fitting in the study area is estimated with ca. 230 m. As a tradeoff between the two values, a cutoff of 300 m is chosen since the area has a relatively elongated shape.
For reference, the subsequent plot shows the training and validation data locations:
a <- ggplot(dt, aes(x = X, y = Y, color = subset)) +
geom_point()+
coord_fixed()+
ggtitle(label = "Interactive Plot Training/Validation Data Set")
plotly::ggplotly(a)
For the interpolation of the categorical variable, ordinary indicator kriging is used. No Cokriging is used in this case. The variables are modeled individually. A version of indicator cokriging is included in the mixed-model Cokriging (VK2). Here, the sequential fitting of anisotropic variograms was used. As such, no interaction between the different rock zones was assumed. This allows a more exact fitting of single semivariogram models at the cost of neglecting a possible relationship between them. Also it somewhat allows a comparison between the interpretation results of the Mixed Model and the sequential Indicator Kriging. Shown below is one of the fitted variograms.
#prepare grid for all further use
options(scipen = 5, digits = 6)
extends <-dt[,c("X", "Y")] %>% sapply(range) %>% as.data.frame()
x = seq(from=min(extends$X)-100, to=max(extends$X)+100, by=10)
y = seq(from=min(extends$Y)-100, to=max(extends$Y)+100, by=10)
xy_grid = expand.grid(X=x, Y=y)
# create the gstat containers
gs_i_FZ <- gstat(id="FZ", formula = (Lcode=="FZ")~1, locations = ~X+Y, data=dt[subset=="training"], nmax = 20, omax=6, maxdist = 300)
gs_i_SA <- gstat(id="SA", formula = (Lcode=="SA")~1, locations = ~X+Y, data=dt[subset=="training"], nmax = 20, omax=6, maxdist = 300)
gs_i_SM <- gstat(id="SM", formula = (Lcode=="SM")~1, locations = ~X+Y, data=dt[subset=="training"], nmax = 20, omax=6, maxdist = 300)
gs_i_UM <- gstat(id="UM", formula = (Lcode=="UM")~1, locations = ~X+Y, data=dt[subset=="training"], nmax = 20, omax=6, maxdist = 300)
Lcodes <- c("FZ", "SA", "SM", "UM")
#initiate the variograms as loop for each LCode
for (i in c("FZ", "SA", "SM", "UM")){
assign(paste0("vg_i_", i),
variogram(get(paste0("gs_i_", i)),
width=35, cutoff=300, alpha=c(0:5)*30 ))
}
#fit FZ
#plot(vg_i_FZ)
vgt_FZ <- vgm(psill=0.1, model="Wav", range=120, nugget=0,anis = c(40,0.4),
add.to = vgm(psill=0.15, range=350, nugget=0.1, model="Gau"))
# plot(vg_i_FZ, model=vgt_FZ)
# fit SA
#plot(vg_i_SA)
vgt_SA <- vgm(psill=0.05, model="Wav", range=50, nugget=0.03,anis = c(0,0.4))
#plot(vg_i_SA, model=vgt_SA)
# fit SM
# plot(vg_i_SM)
vgt_SM <- vgm(psill=0.05, model="Wav", range=80, nugget=0.03,anis = c(0,0.4),
add.to = vgm(psill=0.15, range=350, nugget=0.1, model="Exp"))
#plot(vg_i_SM, model=vgt_SM)
#fit UM
# plot(vg_i_UM)
vgt_UM <- vgm(psill=0.05, model="Exp", range=150, nugget=0.02,anis = c(120,0.3),
add.to = vgm(psill=0.01, range=50, nugget=0, model="Wav",anis = c(30,0.3)))
plot(vg_i_UM, model=vgt_UM)
The figure shows the fitted variogram model for UM. To fit it, an overlay of an exponential and a wave function were used at varying azimut and anisotropy ratios. The Exponential part has an azimut of 120° and the wave part an anisotropy of 30°. The figure also shows that a highly accurate fit could not be achieved. However the main characteristics of the variography structure could be captured. In summary the following variogram models were used for the fitting:
With the fitted variograms, the actual indicator kriging is conducted:
#update gstats
for (i in Lcodes){
assign(paste0("gs_i_", i),
gstat(id=i,
formula = (Lcode==paste0(i))~1,
locations = ~X+Y,
data=dt[subset=="training"],
model = get(paste0("vgt_",i)),
nmax=30)
)
}
#kriging
for (i in Lcodes){
assign(paste0("krig_", i),
predict(get(paste0("gs_i_", i)),
newdata=xy_grid)
)
}
## [using ordinary kriging]
## [using ordinary kriging]
## [using ordinary kriging]
## [using ordinary kriging]
krig_i_s <- cbind(krig_FZ, krig_SA[,3:4], krig_SM[,3:4], krig_UM[3:4])
krig_i_s$Lcode <- krig_i_s %>% select(contains(".pred")) %>% max.col %>%
factor(x = .,levels = c(1,2,3,4),labels = Lcodes)
krig_i_s %<>% as.data.table()
# applying variance cutoff of Q0.90
var_cutoff <- krig_i_s$FZ.var %>% quantile(0.90, na.rm = T)
krig_i_s <- krig_i_s[FZ.var < var_cutoff]
#Plotting the results
a <- ggplot(data = krig_i_s, aes(x=X, y=Y, fill=krig_i_s$Lcode))+
geom_raster()+
coord_fixed()+
geom_point(data=dt[subset=="training",], inherit.aes = F, aes(x=X, y=Y, fill=Lcode), size=3, shape=21, alpha=0.7)+
labs(fill="Lcode", title = "Lcode")
a
The image shows the interpolated results as well as the original sampling points. It can be seen that for the SA areas, small “islands” appear. For UM, which is also relatively scarce, smaller patchy areas are modeled. At some sampling points, the modeled patches are so small that they visually do not extend most overlain sampling points.
The quality of the interpolation is quantified with a confusion matrix as follows. The confusion matrix shows the number of counts in the respective fields, the sum of all predictions in the classes as well as the sum of the true classes (“Target”). The percentage values show the sensitivity for the respective class. Sensitivity is defined as the ratio of the number of true positive predictions divided by the sum of true positives and false negatives. Additionally the overall accuracy is calculated (not shown in the plot). It calculates as the ratio of correct predictions divided by the sum of cases for this model [5].
#kriging for validation data
for (i in Lcodes){
assign(paste0("krig_val_", i),
predict(get(paste0("gs_i_", i)),
newdata=dt[subset=="validation",])
)
}
## [inverse distance weighted interpolation]
## [inverse distance weighted interpolation]
## [inverse distance weighted interpolation]
## [inverse distance weighted interpolation]
krig_val_i_s <- cbind(krig_val_FZ, krig_val_SA[,3:4], krig_val_SM[,3:4], krig_val_UM[3:4])
krig_val_i_s$Lcode <- krig_val_i_s %>% select(contains(".pred")) %>% max.col %>%
factor(x = .,levels = c(1,2,3,4),labels = Lcodes)
krig_val_i_s %<>% as.data.table()
CM_i <- cvms::confusion_matrix(targets =dt[subset=="validation", Lcode], predictions = krig_val_i_s$Lcode )
cvms::plot_confusion_matrix(CM_i$`Confusion Matrix`[[1]],
add_sums = TRUE,
add_normalized = F,
diag_percentages_only = T,add_row_percentages = F, sums_settings = )
It can be seen that the accuracy is relatively mediocre. The overall accuracy is 58.8%.
In the context of accuracy, it must also mentioned, that the variable UM and SA appear relatively sparsely and their structures are small/thin. As such a certain error can be expected when comparing to the validation set.
In this chapter, omnidirectional cokriging with the log-transformed values is conducted. Since this model is supposed to serve as a “simplistic” model. Neither anisotropy nor directional trends will be incorporated.
As a first approach to the modeling of the dataset, the omnnidirectional variograms are fitted as follows. Since the variogram matrix is rather big, use the show/hide button to show the plot.
#variables to use
vars_l = c("Al_log", "Co_log", "Fe_log", "Filler_log", "Mg_log", "Ni_log")
#loop for allocating values to gstat object - only training data
#--> slight change to script, instead of prepopolating gs_l, it is set as empty var, by this the specifics of the variable reading have to be typed only once
{gs_l=NULL
for(j in vars_l){
frm = as.formula(paste(j,1, sep="~"))
print(frm)
gs_l = gstat(gs_l, id=j, formula=frm, locations=~X+Y, data=dt[subset=="training"], nmax = 20, omax=6, maxdist = 300)
}
}
## Al_log ~ 1
## Co_log ~ 1
## Fe_log ~ 1
## Filler_log ~ 1
## Mg_log ~ 1
## Ni_log ~ 1
vge_l = gstat::variogram(gs_l,width=35, cutoff= 300) # create variograms
#create initial varmodels
vgt_l = vgm(psill=0.5, range=120, nugget=1, model="Sph", add.to=vgm(psill=0.8, range=300, model="Gau"))
#fitting
gs_l = gstat::fit.lmc(v=vge_l, g = gs_l, model = vgt_l, correct.diagonal = 1.00001)
#plot varplots + fits
plot(vge_l, model = gs_l$model)
Most variogram fits are considered sufficiently fitting. Some element combinations show problematic fitting. However their weight to the final result is considered to be very small. As such, the fits are considered sufficient and the actual cokriging commences.
Predictions that have a variance of greater than the .95-quantile of the total variance of Fe are filtered out for display and further purposes.
#making grid with a margin of 100 m around extends of survey area
options(scipen = 5, digits = 6)
extends <-dt[,c("X", "Y")] %>% sapply(range) %>% as.data.frame()
x = seq(from=min(extends$X)-100, to=max(extends$X)+100, by=10)
y = seq(from=min(extends$Y)-100, to=max(extends$Y)+100, by=10)
xy_grid = expand.grid(X=x, Y=y)
#prediction
cokriged = predict(gs_l, newdata=xy_grid, debug.level=-1, nmax=20, omax = 6)
## Linear Model of Coregionalization found. Good.
## [using ordinary cokriging]
##
4% done
21% done
30% done
38% done
46% done
57% done
71% done
100% done
#cokriged %>% select(contains(".var")) %>% summary # for debugging
#sets the variance cutoff above wich values are omitted
var_cutoff <- cokriged$Fe_log.var %>% quantile(0.95, na.rm = T)
#omitting the respective values
cokriged <- cokriged %>% as.data.table()
cokriged <- cokriged[Fe_log.var < var_cutoff]
cokriged_val <- predict(gs_l, newdata=dt[subset=="validation", c("X", "Y")], debug.level=-1, nmax=20, omax = 6,)
## Linear Model of Coregionalization found. Good.
## [using ordinary cokriging]
##
18% done
100% done
#storing backtransformed kriging values
cokriged2 <- cokriged %>% select(contains(".pred")) %>% exp %>% cbind(cokriged[,c("X", "Y")], .)
Shown below are the interpolation results for the omnidirectional cokriging for all elements and Filler. The values are backtransformed into normal space.
a <- ggplot(data = cokriged2, aes(x=X, y=Y, fill=cokriged2$Al_log.pred))+
geom_raster()+
coord_fixed()+
scale_fill_gradientn(colors = c("blue2","purple", "red2"))+
geom_point(data=dt[subset=="training",], inherit.aes = F, aes(x=X, y=Y), color="black", size=1)+
labs(fill="Al [%]")
a
Generally, few higher Al containing zones exist with contents of up to 12.5 %.
a <- ggplot(data = cokriged2, aes(x=X, y=Y, fill=cokriged2$Co_log.pred))+
geom_raster()+
coord_fixed()+
scale_fill_gradientn(colors = c("blue2","purple", "red2"))+
geom_point(data=dt[subset=="training",], inherit.aes = F, aes(x=X, y=Y), color="black", size=1)+
labs(fill="Co [%]")
a
Co is relatively scarce with only three smaller areas where is is concentrated above 0.1 %. Especially one area in the south-west stands out.
a <- ggplot(data = cokriged2, aes(x=X, y=Y, fill=cokriged2$Fe_log.pred))+
geom_raster()+
coord_fixed()+
scale_fill_gradientn(colors = c("blue2","purple", "red2"))+
geom_point(data=dt[subset=="training",], inherit.aes = F, aes(x=X, y=Y), color="black", size=1)+
labs(fill="Fe [%]")
a
Iron is relatively abundant in the deposit. With large areas containing above 35% of iron in the east, west as well as small strips in the north.
a <- ggplot(data = cokriged2, aes(x=X, y=Y, fill=cokriged2$Filler_log.pred))+
geom_raster()+
coord_fixed()+
scale_fill_gradientn(colors = c("blue2","purple", "red2"))+
geom_point(data=dt[subset=="training",], inherit.aes = F, aes(x=X, y=Y), color="black", size=1)+
labs(fill="Filler [%]")
a
While Filler is not a real chemical variable, it is shown for reference purposes. It shows a opposite behavior than the iron. This is logical since iron is the most abundant element and the covariates form a composite.
a <- ggplot(data = cokriged2, aes(x=X, y=Y, fill=cokriged2$Mg_log.pred))+
geom_raster()+
coord_fixed()+
scale_fill_gradientn(colors = c("blue2","purple", "red2"))+
geom_point(data=dt[subset=="training",], inherit.aes = F, aes(x=X, y=Y), color="black", size=1)+
labs(fill="Mg [%]")
a
Mg in large areas is relatively scarce with the main area below 10%. However, small strips in the central north as well in the south show concentrations of up to 30%.
a <- ggplot(data = cokriged2, aes(x=X, y=Y, fill=cokriged2$Ni_log.pred))+
geom_raster()+
coord_fixed()+
scale_fill_gradientn(colors = c("blue2","purple", "red2"))+
geom_point(data=dt[subset=="training",], inherit.aes = F, aes(x=X, y=Y), color="black", size=1)+
labs(fill="Ni [%]")
a
Ni shows several areas where higher contents are measured. Especially in the West with up to 1.5% Ni. Also higher concentrations can be found in the center and southeast.
In this chapter, the mixed model cokriging is shown. Here, all chemical variables including the filler variable and the Lcode are combined in one model. No trend is incorporated in the model. According to Journel and Rossi 1989 [6], kriging with a trend is mainly of importance when extrapolation is sought. Since this is not the case here, no trend is used. However anisotropy is incorporated in the model.
draft comment: adding a trend seem to add an odd spiking behavior that i currently don’t understand. Hence, trend is omitted.
#populate gstat with num-vars
{gs_cs=NULL
for(j in vars_l){
frm = as.formula(paste(j,"1", sep="~"))
gs_cs = gstat(gs_cs, id=j, formula=frm, locations=~X+Y, data=dt[subset=="training"], nmax = 20, omax=6, maxdist = 200,nmin = 5)
}
}
#add cat. levels
gs_cs <- gstat(id="FZ", formula = (Lcode=="FZ")~1, locations = ~X+Y, data=dt[subset=="training"], nmax = 20, omax=6, maxdist = 200,g = gs_cs, nmin = 5) %>%
gstat(id="SA", formula = (Lcode=="SA")~1, locations = ~X+Y, data=dt[subset=="training"], nmax = 20, omax=6, maxdist = 200, nmin = 5) %>%
gstat(id="SM", formula = (Lcode=="SM")~1, locations = ~X+Y, data=dt[subset=="training"], nmax = 20, omax=6, maxdist = 200, nmin = 5) %>%
gstat(id="UM", formula = (Lcode=="UM")~1, locations = ~X+Y, data=dt[subset=="training"], nmax = 20, omax=6, maxdist = 200,nmin = 5)
The modeling procedure is as follows: First, the empirical variograms are evaluated. Then, using the varioApplet provided during the short course sessions, a preliminary fit with regards to anisotropy is fit in such a way that it provides a sufficient fit for the single elements. Here, the anisotropy values as follows where extracted:
With these values, the autofit function is used to fit the remaining parameters. For the fitting, a spherical model is applied.
The following variogram shows the empirical variogram with the fitted anisotropic model for Ni_log. The azimuth was varied in 30° steps. The horizontal angular tolerance equals 15° (default). Subsequently, the exemplary variogram for Fe_log is shown.
# empirical variogram
vg_cs <- variogram(gs_cs,width=30, cutoff=300, alpha=c(0:5)*30)
anis <- c(156,0.52) #anisotropy values from applet
#fit model
vg_cs_m = vgm(psill=0.9, range=300, nugget=1,anis= anis, model="Sph")
gs_cs= gstat::fit.lmc(v=vg_cs, model=vg_cs_m, g=gs_cs, correct.diagonal = 1.0001)
#gs_cs$model <- model4
##draw selected plot with code from applet
# number of directions
ndirs = vg_cs$dir.hor %>% unique %>% length
# color scale
col_dirs = RColorBrewer::brewer.pal(ndirs,"Set1")
# plot
#for which variable?
showel <- "Ni_log"
vg0 = vg_cs[vg_cs$id==paste(showel),]
plot(gamma~dist, data=vg0, pch=21, cex=1.2,
bg=col_dirs[as.factor(dir.hor)],
xlim=range(0,vg0$dist), ylim=range(0, vg0$gamma))
vg0[, c("dist", "gamma", "dir.hor")] %>%
split(as.factor(vg_cs$dir.hor)) %>%
sapply(lines, col="grey")
## $`0`
## NULL
##
## $`30`
## NULL
##
## $`60`
## NULL
##
## $`90`
## NULL
##
## $`120`
## NULL
##
## $`150`
## NULL
myfun = function(x){
lines(x[,c(1,2)], col=col_dirs[x$dir.hor/30+1], lty=2)
}
vg0[,c("dist", "gamma", "dir.hor")] %>%
split(as.factor(vg0$dir.hor)) %>% sapply(myfun)
## $`0`
## NULL
##
## $`30`
## NULL
##
## $`60`
## NULL
##
## $`90`
## NULL
##
## $`120`
## NULL
##
## $`150`
## NULL
for(i in 1:6){
angle = pi/2 - ((i-1)*30)*pi/180
direction = c(sin(angle), cos(angle) ,0)
variodens = variogramLine(gs_cs$model[[paste(showel)]], maxdist= 1.1*max(vg_cs$dist), dir=direction)
lines(variodens, col=col_dirs[i], lwd=2)
}
legend(x = "bottomright", legend = vg_cs$dir.hor %>% unique %>% paste(., "°") , fill=col_dirs )
It can be seen, that the fit is not perfect but the general features of the variography are taken. The anisotropy angle varies slightly for the different elements, as such a deviation can be seen e.g. for the shown Ni_log. With the fitted semivariance models, the actual cokriging can be executed.
cok_cs = predict(gs_cs, newdata=xy_grid, debug.level = -1, nmax=20, omax = 6 ) %>% as.data.table()
## Linear Model of Coregionalization found. Good.
## [using ordinary cokriging]
##
1% done
13% done
20% done
23% done
27% done
29% done
32% done
36% done
39% done
42% done
45% done
49% done
53% done
58% done
63% done
70% done
79% done
100% done
#calculation of variance cutoff
var_cutoff <- cok_cs$Fe_log.var %>% quantile(0.90, na.rm = T)
#omitting the respective values
cok_cs <- cok_cs[Fe_log.var < var_cutoff]
#predictions for evaluation set
cok_cs_val <- predict(gs_cs, newdata=dt[subset=="validation", c("X", "Y")], debug.level=-1, nmax=20, omax = 6,)
## Linear Model of Coregionalization found. Good.
## [using ordinary cokriging]
##
7% done
100% done
cok_cs_val$Lcode <- cok_cs_val %>% select(paste0(Lcodes, ".pred")) %>% max.col() %>%
factor(x = .,levels = c(1,2,3,4),labels = Lcodes)
#calculating most propable Lcode from results
Lcodes <- c("FZ", "SA", "SM", "UM")
cok_cs$Lcode <- cok_cs %>% select(paste0(Lcodes, ".pred")) %>% max.col() %>%
factor(x = .,levels = c(1,2,3,4),labels = Lcodes)
#kriging results of backtransformed values
cok_cs2 <- cok_cs %>% select(paste0(fkt_new,".pred")) %>% exp %>% cbind(cok_cs[,c("X", "Y")], .)
The figure below shows the results for the element Ni (backtransformed into normal space), the spatial distribution of the variance as well as the modeling results of Lcode.
a <- ggplot(data = cok_cs2, aes(x=X, y=Y, fill=cok_cs2$Ni_log.pred))+
geom_raster()+
coord_fixed()+
scale_fill_gradientn(colors = c("blue2","purple", "red2"))+
geom_point(data=dt[subset=="training",], inherit.aes = F, aes(x=X, y=Y), color="black", size=0.5)+
labs(fill="Ni [%]", title = "Backtransformed Ni")
b <- ggplot(data = cok_cs, aes(x=X, y=Y, fill=cok_cs$Lcode))+
geom_raster()+
coord_fixed()+
geom_point(data=dt[subset=="training",], inherit.aes = F, aes(x=X, y=Y, fill=Lcode), size=0.7, shape=21)+
labs(fill="Lcode", title = "Lcode")
c <- ggplot(data = cok_cs, aes(x=X, y=Y, fill=cok_cs$Ni_log.var))+
geom_raster()+
coord_fixed()+
scale_fill_gradientn(colors = c("blue2","purple", "red2"))+
geom_point(data=dt[subset=="training",], inherit.aes = F, aes(x=X, y=Y), color="black", size=0.5)+
labs(fill="Var", title = "Variance Ni_log")
grid.arrange(a,b,c, nrow=2)
The second plot reveals a problematic situation with Lcode. Since the sampling sites for SA are relatively sparsely distributed, the variogram could not not capture spacial dependence variance changes even at the minimum bin. This can be seen in the variogram for SA. As a result even in cells that are very close to SA-sampled drillholes, SA is not the “most probable” Lcode. This results in rather odd results for the validation of LCode as shown in the following:
CM_m <- cvms::confusion_matrix(targets =dt[subset=="validation", Lcode], predictions = cok_cs_val$Lcode )
cvms::plot_confusion_matrix(CM_m$`Confusion Matrix`[[1]],
add_sums = TRUE,
add_normalized = F,
diag_percentages_only = T,add_row_percentages = F)
It can be seen that no validation data points where classified as UM and SA although 27 resp. 32 members exist in these classes. The sensitivity values lie at 60.67% for SM and 0.76% for FZ. Following from this, the overall accuracy lies at 60.2%. This means, that the accuracy for this model is marginally better - by 1.4%. This is remarkable given the total misclassification of the minor groups UM and SA and can be attributed to their relatively low contribution to the total sample amount.
With compositional modeling, the covariates are treated as parts of a whole that sum up to one. Meaning, when one variable decreases, one or several others have to increase in order to maintain the condition that the proportions of the composition sums up to 1. As such, they form a multidimensional simplex. To illustrate this, the ternary diagram for the abundant variables Fe, Mg and Al, as well as for the elements Co, Ni and Fe is shown below. The diagram scales were centered. As such, the sparse variables can visually be compared to the abundant Fe.
dtcomp0 <- dt %>% select(c("Fe", "Mg", "Al")) %>% acomp
plot.acomp(dtcomp0,center = T)
dtcomp0 <- dt %>% select(c("Co", "Ni", "Fe")) %>% acomp
plot.acomp(dtcomp0,center = T)
It shows for example that very high values of Mg correspond to lower to medium Al values. Also rising MG values up to a certain point go in line with a relatively stable ratio of Fe to Al.
Subsequently, swath plots for the compositions are used to identify whether the necessity for a spatial trend for the compositions should be considered.
dtt <- dt[subset=="training",]
#dtt <- dt
dtcomp = dtt %>% select(Al:Ni) %>% acomp # calculate compositions
X = dtt %>% select(X, Y) #coordinates
swath(dtcomp, loc=X$X) # quite flat --> consant mean
In X-direction no larger trends are seen.
swath(dtcomp, loc=X$Y)
In Y-direction, local trends can be found. This is especially true for Mg-compositions at low Y-values, but for Co and Al to some extend. However, this does not apply for all variables. As such a constant mean is assumed.
Shown below is the onmidirectional variography as well as variography taking into cooperation anisotropy. The omnidirectional variogram also shows the fitting results as red lines. The anisotropic variography is shown only for comparison. The modeling process uses the fitted omnidirectional variograms.
# define the gmSpatialModel object
gmv=NULL
gmv = make.gmCompositionalGaussianSpatialModel(
data = dtcomp, # use
coords = X, # spatial coordinates
V = "alr", # eventually, use alr logratios if needed
formula = ~1, # without drift in Y-direction
nmax = 20,
maxdist = 300 # cokriging neighbourgood has max. 20 samples
)
# compute logratio variograms
lrvg = gmGeostats::logratioVariogram(gmv,maxdist=300, nbins=8)
# define a compositional linear model
lrmd = CompLinModCoReg(~nugget()+sph(200), comp=dtcomp)
# fit
lrmdf = gmGeostats::fit_lmc(v = lrvg, model=lrmd)
## iteration = 0
## Step:
## [1] 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
## Parameter:
## [1] 1.000000e+00 0.000000e+00 1.000000e+00 0.000000e+00 0.000000e+00
## [6] 1.000000e+00 0.000000e+00 0.000000e+00 0.000000e+00 1.000000e+00
## [11] 0.000000e+00 0.000000e+00 2.775558e-17 0.000000e+00 1.000000e+00
## [16] 2.000000e+02 1.000000e+00 0.000000e+00 1.000000e+00 0.000000e+00
## [21] 0.000000e+00 1.000000e+00 0.000000e+00 0.000000e+00 0.000000e+00
## [26] 1.000000e+00 0.000000e+00 0.000000e+00 2.775558e-17 0.000000e+00
## [31] 1.000000e+00
## Function Value
## [1] 10892.98
## Gradient:
## [1] -108.1566934 297.6195246 3049.3000140 -368.2881452 -748.1822431
## [6] 2766.6826682 -779.7636772 -15.0538308 -921.6368726 -6057.0644291
## [11] -1334.4514791 -78.8617508 -564.8668066 1160.2710983 2968.9535040
## [16] -0.9362738 -77.5475237 216.6639115 2492.8785679 -331.2357185
## [21] -602.9255437 2234.5386624 -647.1378165 -9.2069931 -751.7037157
## [26] -4987.0162420 -1109.4334859 -55.8407992 -460.2596364 958.6872002
## [31] 2427.4625648
##
## iteration = 100
## Parameter:
## [1] 1.17219339 -0.25966094 0.48602329 -0.14367491 0.07739779
## [6] 0.66903912 0.67425965 0.17458887 1.66228026 0.30646005
## [11] 0.37959775 0.12006202 -0.03668350 0.34685736 0.24251699
## [16] 199.66959676 1.09091440 -0.14085006 0.33061637 0.26708699
## [21] 0.22782022 0.73858792 1.42869606 0.03246158 1.39129119
## [26] 0.94901673 0.47699898 0.15237199 -0.16420269 -0.05174749
## [31] -0.15537817
## Function Value
## [1] 49.36508
## Gradient:
## [1] -0.03537434 -0.26171605 -0.14342451 0.30567819 0.45941588 -0.43889453
## [7] -0.12378330 0.38311761 0.97251943 0.30645681 -0.34013786 0.62887612
## [13] -0.49480740 -0.42698502 -0.57098003 0.80957257 -0.38255950 -0.07904357
## [19] 0.11215948 -0.14325279 -0.01733548 0.12452035 0.44839712 0.10580774
## [25] -0.69950010 -0.35525093 0.14353829 -0.18362595 0.56930216 1.44245009
## [31] 0.17761681
##
## Iterationslimit überschritten. Algorithmus fehlgeschlagen.
# plot
plot(lrvg, lrvg=vgram2lrvgram(lrmdf))
Shown below is a plot of the directional behavior of the variograms.
#
gmGeostats::logratioVariogram(gmv, maxdist=300, nbins=8,
azimuth=(0:11)*30, azimuth.tol=60) %>%
image
It can be seen that in E-W direction, a generally lower variation for most logratio-pairs is observed. Hence anisotropy would be advised to use.
draft note: However, while the anisotropy can be fitted, it can not be plotted due to an unknown error. Hence a onmidirectional model is utilized since the quality of the fit can not be evaluated. The code for the fitting and plotting of an anisotropic model can be seen extended below for reference.
lrvga = gmGeostats::logratioVariogram(gmv, maxdist=300, nbins=10,
azimuth=c(0,45,90,115), azimuth.tol=60)
colors <- c("blue", "red", "yellow", "purple")
lrmda = LMCAnisCompo(Z=dtcomp,azimuths=c(0,45,90,115), models=c("nugget", "sph"))
lrmdfa = gmGeostats::fit_lmc(v = lrvga,g=gmv, model=lrmda)
plot(lrvga, lrvg=vgram2lrvgram(lrmdfa))
After the fitting of the logratio variograms, the actual modeling commences. In the following, the kriging results for the backtransformed logratio compositions for three elements Al, Co and Ni are shown
gmv = make.gmCompositionalGaussianSpatialModel(
data = dtcomp, # use vicaria-comp
coords = X, # spatial coordinates
V = "alr", # eventually, use alr logratios if needed
# formula = ~1+Y, # !!!creates bug at interpolation
formula = ~1, # consider drift in Easting direction
nmax = 30, # cokriging neighbourgood has max. 20 samples
maxdist = 300,
omax = 6,
model = as.LMCAnisCompo(lrmdf) # necessary conversion
)
#logratio cokriging
cokrig_alr = predict(gmv, newdata=xy_grid, debug.level=-1)
## starting cokriging
## Linear Model of Coregionalization found. Good.
## [using ordinary cokriging]
##
27% done
41% done
60% done
100% done
# cutoff of maximum variance
var_cutoff <- cokrig_alr$alr1.var %>% quantile(0.9, na.rm = T)
cokrig_alr <- cokrig_alr %>% as.data.table()
cokrig_alr <- cokrig_alr[alr1.var < var_cutoff]
#backtransforming to original variables
cokrig_compo = gsi.gstatCokriging2compo(
cokrig_alr, V="alr", orignames = colnames(dtcomp))
#transform into percent
cokrig_compo <- cokrig_compo %>% as.data.frame %>% multiply_by(100)
#model performance for comparison
cok_alr_val <- predict(gmv, newdata=dt[subset=="validation", c("X", "Y")], debug.level=-1, nmax=20, omax = 6)
## starting cokriging
## Linear Model of Coregionalization found. Good.
## [using ordinary cokriging]
##
3% done
100% done
cok_alr_val = cbind(cok_alr_val[,1:2],
gsi.gstatCokriging2compo(
cok_alr_val, V="alr", orignames = colnames(dtcomp)) %>%
as.data.frame %>% multiply_by(100)
)
ggplot(data =`cokrig_alr`, aes(x=cokrig_alr$X, y=cokrig_alr$Y, fill=cokrig_compo$Al))+
geom_raster()+
coord_fixed()+
scale_fill_gradientn(colors = c("blue2","purple", "red2"))+
geom_point(data=dt[subset=="training",], inherit.aes = F, aes(x=X, y=Y), color="black", size=1)+
labs(fill="Al [%]")
ggplot(data =`cokrig_alr`, aes(x=cokrig_alr$X, y=cokrig_alr$Y, fill=cokrig_compo$Co))+
geom_raster()+
coord_fixed()+
scale_fill_gradientn(colors = c("blue2","purple", "red2"))+
geom_point(data=dt[subset=="training",], inherit.aes = F, aes(x=X, y=Y), color="black", size=1)+
labs(fill="Co [%]")
ggplot(data =`cokrig_alr`, aes(x=cokrig_alr$X, y=cokrig_alr$Y, fill=cokrig_compo$Ni))+
geom_raster()+
coord_fixed()+
scale_fill_gradientn(colors = c("blue2","purple", "red2"))+
geom_point(data=dt[subset=="training",], inherit.aes = F, aes(x=X, y=Y), color="black", size=1)+
labs(fill="Ni [%]")
Following, the cosimulations are shown. The cosimulations use the variographic parameters from their kriging counterparts. As such the results in this chapter are summarized. First the actual modeling is executed for the three models VS1 - VS3.
For this preliminary modeling the number of iterations is set to 10.
n=10
# simulation for cokriging with log-values
set.seed(123)
cosim = predict(gs_l, newdata=xy_grid, debug.level=-1, nmax=20, omax = 6, nsim = n)
## drawing 10 multivariate GLS realisations of beta...
## Linear Model of Coregionalization found. Good.
## [using conditional Gaussian cosimulation]
##
1% done
5% done
9% done
12% done
16% done
20% done
23% done
27% done
30% done
34% done
37% done
41% done
44% done
48% done
51% done
54% done
58% done
61% done
64% done
68% done
71% done
74% done
77% done
81% done
84% done
87% done
90% done
93% done
96% done
99% done
100% done
# simulation for logratio compositions
cos_alr = predict(gmv, newdata=xy_grid, debug.level=-1,nmax=20, omax=6, nsim=n)
## starting cokriging
## drawing 10 multivariate GLS realisations of beta...
## Linear Model of Coregionalization found. Good.
## [using conditional Gaussian cosimulation]
##
0% done
6% done
13% done
19% done
24% done
30% done
36% done
41% done
47% done
52% done
58% done
63% done
68% done
73% done
79% done
84% done
89% done
94% done
99% done
100% done
cos_alr1 <- cos_alr
cos_alr <- cos_alr1
# outcome analysis
class(cos_alr)
## [1] "data.frame"
dim(cos_alr)
## [1] 15759 52
colnames(cos_alr)
## [1] "X" "Y" "alr1.sim1" "alr1.sim2" "alr1.sim3"
## [6] "alr1.sim4" "alr1.sim5" "alr1.sim6" "alr1.sim7" "alr1.sim8"
## [11] "alr1.sim9" "alr1.sim10" "alr2.sim1" "alr2.sim2" "alr2.sim3"
## [16] "alr2.sim4" "alr2.sim5" "alr2.sim6" "alr2.sim7" "alr2.sim8"
## [21] "alr2.sim9" "alr2.sim10" "alr3.sim1" "alr3.sim2" "alr3.sim3"
## [26] "alr3.sim4" "alr3.sim5" "alr3.sim6" "alr3.sim7" "alr3.sim8"
## [31] "alr3.sim9" "alr3.sim10" "alr4.sim1" "alr4.sim2" "alr4.sim3"
## [36] "alr4.sim4" "alr4.sim5" "alr4.sim6" "alr4.sim7" "alr4.sim8"
## [41] "alr4.sim9" "alr4.sim10" "alr5.sim1" "alr5.sim2" "alr5.sim3"
## [46] "alr5.sim4" "alr5.sim5" "alr5.sim6" "alr5.sim7" "alr5.sim8"
## [51] "alr5.sim9" "alr5.sim10"
#debugged version to backtransform results in one container
cos_alr <- cos_alr1
cos_alr <- cos_alr[,-c(1,2)]
alrs <- c("alr1", "alr2", "alr3", "alr4","alr5")
i=1
cos_comp <- list()
for (i in 1:n){
cos_comp[[i]] <- cos_alr %>% select(paste0(alrs,".sim", i)) %>%alrInv(z = .,orig = dtcomp)
}
# mixed model cosimulation
cos_cs = predict(gs_cs, newdata=xy_grid, debug.level = -1, nmax=20, omax = 6, nsim=n) %>% as.data.table()
## drawing 10 multivariate GLS realisations of beta...
## Linear Model of Coregionalization found. Good.
## [using conditional Gaussian cosimulation]
##
0% done
2% done
3% done
4% done
5% done
7% done
8% done
9% done
10% done
12% done
13% done
14% done
16% done
17% done
18% done
19% done
20% done
22% done
23% done
24% done
25% done
26% done
28% done
29% done
30% done
31% done
32% done
34% done
35% done
36% done
37% done
38% done
40% done
41% done
42% done
43% done
44% done
45% done
46% done
47% done
49% done
50% done
51% done
52% done
53% done
54% done
55% done
57% done
58% done
59% done
60% done
61% done
62% done
63% done
64% done
65% done
67% done
68% done
69% done
70% done
71% done
72% done
73% done
74% done
75% done
76% done
78% done
79% done
80% done
81% done
82% done
83% done
84% done
85% done
86% done
87% done
88% done
89% done
91% done
92% done
93% done
94% done
95% done
96% done
97% done
98% done
99% done
100% done
#backup results for possibility of subsequent code changing and rerun
cosim.BU <- cosim
cos_comp.BU <- cos_comp
cos_cs.BU <- cos_cs
The subsequent plots show one iteration of the respective modeling procedure on the example of Ni (backtransformed). This only allows for a general qualitative overview of the methods.
b <- ggplot(data = cosim, aes(x=X, y=Y, fill=cosim$Ni_log.sim1 %>% exp()))+
geom_raster()+
coord_fixed()+
scale_fill_gradientn(colors = c("blue2","purple", "red2"))+
#geom_point(data=dt[subset=="training",], inherit.aes = F, aes(x=X, y=Y), color="black", size=1)+
labs(fill="Ni [%]", title ="VS1")
a <- ggplot(data = cos_cs, aes(x=X, y=Y, fill=cos_cs$Ni_log.sim1%>% exp()))+
geom_raster()+
coord_fixed()+
scale_fill_gradientn(colors = c("blue2","purple", "red2"))+
#geom_point(data=dt[subset=="training",], inherit.aes = F, aes(x=X, y=Y), color="black", size=1)+
labs(fill="Ni [%]", title ="VS2")
c <- ggplot(data = cos_comp[[3]] %>% as.data.frame(), aes(x=cosim$X, y=cosim$Y, fill=Ni*100))+
geom_raster()+
coord_fixed()+
scale_fill_gradientn(colors = c("blue2","purple", "red2"))+
#geom_point(data=dt[subset=="training",], inherit.aes = F, aes(x=X, y=Y), color="black", size=1)+
labs(fill="Ni [%]", title ="VS3")
grid.arrange(b, a,c, nrow=2)
It can be seen that from a qualitative point of view, the results are relatively similar. The distributions of the values however deviates from example to example. Especially in the S-W-area of the survey field, a varied distribution can be seen.
To allow a more summarizing assessment of the three methods, the mean values of the simulations are shown below.
# mean values of cosimulations
#reset containers from BU (not necessary, only for rerunning this chunk with varied options)
cosim.BU -> cosim
cos_comp.BU -> cos_comp
cos_cs.BU -> cos_cs
#cosimulation with logvalues VS1
for (i in fkt_new) {
cosim[,paste0(i,".mean")] <- cosim %>% select(contains(i)) %>% exp() %>% apply(.,1, mean) %>% as.vector()
cosim[,paste0(i,".vc")] <- cosim %>% select(contains(i))%>% exp() %>% apply(.,1, function (x)(sd(x)/mean(x))) %>% as.vector()
}
# mixed model VS2
for (i in fkt_new) {
cos_cs[,paste0(i,".mean")] <- cos_cs %>% select(contains(i))%>% exp() %>% apply(.,1, mean) %>% as.vector()
cos_cs[,paste0(i,".vc")] <- cos_cs %>% select(contains(i))%>% exp() %>% apply(.,1, function (x)(sd(x)/mean(x))) %>% as.vector()
}
# logratio cosimulation VS3
cos_comp1 <- array(as.numeric(unlist(cos_comp)), dim=c(nrow(cos_comp[[1]]),
ncol(cos_comp[[1]]),
n))
cos_comp_sum <- data.table()
o <- 0
for (i in colnames.num) {
o <- o+1
cos_comp_sum[,paste0(i,".mean")] <- cos_comp1[,o,] %>% as.data.frame() %>% apply(.,1, mean) %>% as.vector()*100
cos_comp_sum[,paste0(i,".vc")] <- cos_comp1[,o,] %>% as.data.frame() %>% apply(.,1, function (x)(sd(x*100)/mean(x*100))) %>% as.vector()
}
cos_comp_sum <- cbind(cos_comp_sum, X=cosim$X, Y=cosim$Y)
a <- ggplot(data = cosim, aes(x=X, y=Y, fill=(cosim$Ni_log.mean)))+
geom_raster()+
coord_fixed()+
scale_fill_gradientn(colors = c("blue2","purple", "red2"))+
#geom_point(data=dt[subset=="training",], inherit.aes = F, aes(x=X, y=Y), color="black", size=1)+
labs(fill="Ni [%]", title ="VS1")
b <- ggplot(data = cos_cs, aes(x=X, y=Y, fill=(cos_cs$Ni_log.mean)))+
geom_raster()+
coord_fixed()+
scale_fill_gradientn(colors = c("blue2","purple", "red2"))+
#geom_point(data=dt[subset=="training",], inherit.aes = F, aes(x=X, y=Y), color="black", size=1)+
labs(fill="Ni [%]", title ="VS2")
c <- ggplot(data = cos_comp_sum, aes(x=X, y=Y, fill=Ni.mean))+
geom_raster()+
coord_fixed()+
scale_fill_gradientn(colors = c("blue2","purple", "red2"))+
#geom_point(data=dt[subset=="training",], inherit.aes = F, aes(x=X, y=Y), color="black", size=1)+
labs(fill="Ni [%]", title ="VS3")
maxlegend <- c(a$data$Ni_log.mean, b$data$Ni_log.mean, c$data$Ni_log.mean) %>% max()
a <- a+scale_fill_gradientn(colors = c("blue2","purple", "red2"), limits=c(0,maxlegend))
b <- b+scale_fill_gradientn(colors = c("blue2","purple", "red2"), limits=c(0,maxlegend))
c <- c+scale_fill_gradientn(colors = c("blue2","purple", "red2"), limits=c(0,maxlegend))
grid.arrange(a,b,c, nrow=2)
To allow a comparison, the maximum of the color scale was chosen to be the maximum of all three plots ( 1.976 %).
It can be seen, that generally all three plot show similar results. In the S-W of the survey field, a structure with a higher mineralization can be found. Furthermore, the results for VS1 are generally higher in the peak areas, also the peak seems to take a larger area. The results of VS2 and VS3 show similar values - with VS3 showing lower values in the mineralized areas.
Following, the distribution of the variation coefficient as calculated from all simulated iterations is shown. This allows an assessment of areas where the variation is higher or lower. Also it allows for an assessment of differences in the general behavior of the different simulation techniques. In a practical mine planning application, it also allows to identify areas where element contents are expected to vary more. Such information could for example have an influence on stockpiling strategies.
b <- ggplot(data = cosim, aes(x=X, y=Y, fill=(cosim$Ni_log.vc)))+
geom_raster()+
coord_fixed()+
scale_fill_gradientn(colors = c("blue2","purple", "red2"))+
#geom_point(data=dt[subset=="training",], inherit.aes = F, aes(x=X, y=Y), color="black", size=1)+
labs(fill="VC", title ="Cosimulation w. log-values")
a <- ggplot(data = cos_cs, aes(x=X, y=Y, fill=(cos_cs$Ni_log.vc)))+
geom_raster()+
coord_fixed()+
scale_fill_gradientn(colors = c("blue2","purple", "red2"))+
#geom_point(data=dt[subset=="training",], inherit.aes = F, aes(x=X, y=Y), color="black", size=1)+
labs(fill="VC", title ="Mixed model cosimulation w. log-values")
c <- ggplot(data = cos_comp_sum, aes(x=X, y=Y, fill=Ni.vc))+
geom_raster()+
coord_fixed()+
scale_fill_gradientn(colors = c("blue2","purple", "red2"))+
#geom_point(data=dt[subset=="training",], inherit.aes = F, aes(x=X, y=Y), color="black", size=1)+
labs(fill="VC", title ="Compositional logratio cosimulation")
grid.arrange(b,a,c, nrow=2)
It can be seen that large differences appear between VS3 and the first two simulation methods. Where VS1 and 2 show areas with heightened variations, The logratio cosimulation shows a more homogeneous picture. Inside of the sampling area, the variation is clearly smaller than outside of the sampling area. VS1 and 2 show smaller patches of high variance on the boarders of the survey area. Furthermore, the influence of the anisotropy in VS3 can be seen. It alters the shapes of the highly varying areas to a more elliptic shape with their main axis following the anisotropy angle chosen before ( 156°)
In order to evaluate the performance of the three different models, the validation part of the data set is used for cross validation. Here, the predicted values (backtransformed) are compared to the real values of the validation set. For these calculations, again 10 repetitions are used for the simulations.
# simulation for cokriging with log-values V1
set.seed(123)
cosim_val = predict(gs_l, newdata=dt[subset=="validation", c("X", "Y")], debug.level=-1, nmax=20, omax = 6, nsim = n)
## drawing 10 multivariate GLS realisations of beta...
## Linear Model of Coregionalization found. Good.
## [using conditional Gaussian cosimulation]
##
100% done
# mixed model cosimulation
cos_cs_val = predict(gs_cs, newdata=dt[subset=="validation", c("X", "Y")], debug.level = -1, nmax=20, omax = 6, nsim=n) %>% as.data.table()
## drawing 10 multivariate GLS realisations of beta...
## Linear Model of Coregionalization found. Good.
## [using conditional Gaussian cosimulation]
##
33% done
82% done
100% done
# simulation for logratio compositions
cos_alr_val = predict(gmv, newdata=dt[subset=="validation", c("X", "Y")], debug.level=-1,nmax=20, omax=6, nsim=n)
## starting cokriging
## drawing 10 multivariate GLS realisations of beta...
## Linear Model of Coregionalization found. Good.
## [using conditional Gaussian cosimulation]
##
100% done
# outcome analysis
class(cos_alr)
## [1] "data.frame"
dim(cos_alr)
## [1] 15759 50
colnames(cos_alr)
## [1] "alr1.sim1" "alr1.sim2" "alr1.sim3" "alr1.sim4" "alr1.sim5"
## [6] "alr1.sim6" "alr1.sim7" "alr1.sim8" "alr1.sim9" "alr1.sim10"
## [11] "alr2.sim1" "alr2.sim2" "alr2.sim3" "alr2.sim4" "alr2.sim5"
## [16] "alr2.sim6" "alr2.sim7" "alr2.sim8" "alr2.sim9" "alr2.sim10"
## [21] "alr3.sim1" "alr3.sim2" "alr3.sim3" "alr3.sim4" "alr3.sim5"
## [26] "alr3.sim6" "alr3.sim7" "alr3.sim8" "alr3.sim9" "alr3.sim10"
## [31] "alr4.sim1" "alr4.sim2" "alr4.sim3" "alr4.sim4" "alr4.sim5"
## [36] "alr4.sim6" "alr4.sim7" "alr4.sim8" "alr4.sim9" "alr4.sim10"
## [41] "alr5.sim1" "alr5.sim2" "alr5.sim3" "alr5.sim4" "alr5.sim5"
## [46] "alr5.sim6" "alr5.sim7" "alr5.sim8" "alr5.sim9" "alr5.sim10"
#debugged version to backtransform results in one container
cos_alr_val <- cos_alr_val[,-c(1,2)]
alrs <- c("alr1", "alr2", "alr3", "alr4","alr5")
i=1
cos_comp_val <- list()
for (i in 1:n){
cos_comp_val[[i]] <- cos_alr_val %>% select(paste0(alrs,".sim", i)) %>%alrInv(z = .,orig = dtcomp)
}
# mean values for validation
#cosimulation with logvalues
for (i in fkt_new) {
cosim_val[,paste0(i,".mean")] <- cosim_val %>% select(contains(i)) %>% exp() %>% apply(.,1, mean) %>% as.vector()
cosim_val[,paste0(i,".vc")] <- cosim_val %>% select(contains(i))%>% exp() %>% apply(.,1, function (x)(sd(x)/mean(x))) %>% as.vector()
}
# mixed model
for (i in fkt_new) {
cos_cs_val[,paste0(i,".mean")] <- cos_cs_val %>% select(contains(i))%>% exp() %>% apply(.,1, mean) %>% as.vector()
cos_cs_val[,paste0(i,".vc")] <- cos_cs_val %>% select(contains(i))%>% exp() %>% apply(.,1, function (x)(sd(x)/mean(x))) %>% as.vector()
}
# logratio cosimulation
cos_comp1_val <- array(as.numeric(unlist(cos_comp_val)), dim=c(nrow(cos_comp_val[[1]]),
ncol(cos_comp_val[[1]]),
n))
cos_comp_sum_val <- data.table()
o <- 0
for (i in colnames.num) {
o <- o+1
cos_comp_sum_val[,paste0(i,".mean")] <- cos_comp1_val[,o,] %>% as.data.frame() %>% apply(.,1, mean) %>% as.vector() *100 # multiply by 100 to opbtain %
cos_comp_sum_val[,paste0(i,".vc")] <- cos_comp1_val[,o,] %>% as.data.frame() %>% apply(.,1, function (x)(sd(x*100)/mean(x*100))) %>% as.vector() # multiply by 100 to opbtain %
}
cos_comp_sum_val <- cbind(cos_comp_sum_val,
X=dt[subset=="validation", c("X")],
Y=dt[subset=="validation", c("Y")])
The mean absolute error of the prediction is calculated.
#dataframe consisting of predicted and actual validation values
#for VK1 cokriging with log values
dt.kv_l <- data.table(cokriged_val %>% select(contains(".pred")) %>% exp,
dt[subset=="validation"] %>% select(fkt_new) %>% exp)
sumry_l <- dt.kv_l %>% summarise(.,
Name = "VK1",
Al_MAE = mean(abs(Al_log-Al_log.pred)),
Co_MAE = mean(abs(Co_log-Co_log.pred)),
Fe_MAE = mean(abs(Fe_log-Fe_log.pred)),
Filler_MAE = mean(abs(Filler_log-Filler_log.pred)),
Mg_MAE = mean(abs(Mg_log-Mg_log.pred)),
Ni_MAE = mean(abs(Ni_log-Ni_log.pred)))
#for VK2 mixed model cokriging
dt.kv_cs <- data.table(cok_cs_val %>% select(paste0(fkt_new,".pred")) %>% exp,
dt[subset=="validation"] %>% select(fkt_new) %>% exp)
sumry_cs <- dt.kv_cs %>% summarise(.,
Name = "VK2",
Al_MAE = mean(abs(Al_log-Al_log.pred)),
Co_MAE = mean(abs(Co_log-Co_log.pred)),
Fe_MAE = mean(abs(Fe_log-Fe_log.pred)),
Filler_MAE = mean(abs(Filler_log-Filler_log.pred)),
Mg_MAE = mean(abs(Mg_log-Mg_log.pred)),
Ni_MAE = mean(abs(Ni_log-Ni_log.pred)))
# for VK3 loratio cokriging
dt.alr <- data.table(cok_alr_val[,-c(1,2)] ,
dt[subset=="validation"] %>% select(fkt_new) %>% exp)
sumry_alr <- dt.alr %>% summarise(.,
Name = "VK3",
Al_MAE = mean(abs((Al_log)-(Al))),
Co_MAE = mean(abs((Co_log)-(Co))),
Fe_MAE = mean(abs((Fe_log)-(Fe))),
Filler_MAE = mean(abs((Filler_log)-(Filler))),
Mg_MAE = mean(abs((Mg_log)-(Mg))),
Ni_MAE = mean(abs((Ni_log)-(Ni))))
#######for cosimulations######
#for VS1 cosimulation with log values
dt.cv_l <- data.table(cosim_val %>% select(contains(".mean")),
dt[subset=="validation"] %>% select(fkt_new) %>% exp)
sumry_cvl <- dt.cv_l %>% summarise(.,
Name = "VS1",
Al_MAE = mean(abs(Al_log-Al_log.mean)),
Co_MAE = mean(abs(Co_log-Co_log.mean)),
Fe_MAE = mean(abs(Fe_log-Fe_log.mean)),
Filler_MAE = mean(abs(Filler_log-Filler_log.mean)),
Mg_MAE = mean(abs(Mg_log-Mg_log.mean)),
Ni_MAE = mean(abs(Ni_log-Ni_log.mean)))
#for VS2 mixed model cosimulation
dt.cv_cs <- data.table(cos_cs_val %>% select(paste0(fkt_new,".mean")),
dt[subset=="validation"] %>% select(fkt_new) %>% exp)
sumry_cv_cs <- dt.cv_cs %>% summarise(.,
Name = "VS2",
Al_MAE = mean(abs(Al_log-Al_log.mean)),
Co_MAE = mean(abs(Co_log-Co_log.mean)),
Fe_MAE = mean(abs(Fe_log-Fe_log.mean)),
Filler_MAE = mean(abs(Filler_log-Filler_log.mean)),
Mg_MAE = mean(abs(Mg_log-Mg_log.mean)),
Ni_MAE = mean(abs(Ni_log-Ni_log.mean)))
# for VS3 logratio cosimulation
dt.cos_alr <- data.table(cos_comp_sum_val ,
dt[subset=="validation"] %>% select(fkt_new) %>% exp)
sumry_cos_alr <- dt.cos_alr %>% summarise(.,
Name = "VS3",
Al_MAE = mean(abs(Al_log-Al.mean)),
Co_MAE = mean(abs(Co_log-Co.mean)),
Fe_MAE = mean(abs(Fe_log-Fe.mean)),
Filler_MAE = mean(abs(Filler_log-Filler.mean)),
Mg_MAE = mean(abs(Mg_log-Mg.mean)),
Ni_MAE = mean(abs(Ni_log-Ni.mean)))
####Summary####
sum_val <- rbind(sumry_l, sumry_cs, sumry_alr,
sumry_cvl,sumry_cv_cs,sumry_cos_alr) %>% kable(table.attr = "style = \"color: black;\"") %>% kable_styling(bootstrap_options = c("striped", "hover", "condensed"))
sum_val
| Name | Al_MAE | Co_MAE | Fe_MAE | Filler_MAE | Mg_MAE | Ni_MAE |
|---|---|---|---|---|---|---|
| VK1 | 2.202556 | 0.0447709 | 8.856579 | 7.082687 | 3.946235 | 0.3153578 |
| VK2 | 2.270978 | 0.0443517 | 9.107742 | 7.221644 | 4.011018 | 0.3164846 |
| VK3 | 2.201928 | 0.0445365 | 8.806227 | 8.022171 | 3.882312 | 0.3085519 |
| VS1 | 2.644393 | 0.0549201 | 9.684120 | 7.184744 | 4.175863 | 0.3193899 |
| VS2 | 2.632025 | 0.0559316 | 9.170857 | 7.292397 | 3.990590 | 0.3170055 |
| VS3 | 6.447633 | 0.0540879 | 9.678661 | 8.847417 | 3.863677 | 0.3230897 |
# temp <- list()
# for (i in fkt_new) {
# temp[[i]] <- ggplot(data=dt.kv_l, aes(x = get(paste0(i, ".pred")), y=get(i)))+
# geom_point(size=0.5)+
# geom_abline(intercept = 0)+
# labs(title = i, x="Pred. V.", y= "Real V.")
#
#
# }
#
# do.call("grid.arrange", c(temp, ncol=3))
It shows that the variants generally show a similar error. Within this assignment, the variable Ni is of particular focus. Here, VS1 and VS2 show very similar low errors. VK3 shows the best overall performance.
#dataframe consisting of predicted and actual validation values
#for VK1 cokriging with log values
dt.kv_l <- data.table(cokriged_val %>% select(contains(".pred")) %>% exp,
dt[subset=="validation"] %>% select(fkt_new) %>% exp)
sumry_l <- dt.kv_l %>% summarise(.,
Name = "VK1",
Al_MRE = mean(abs(Al_log-Al_log.pred)/Al_log),
Co_MRE = mean(abs(Co_log-Co_log.pred)/Co_log),
Fe_MRE = mean(abs(Fe_log-Fe_log.pred)/Fe_log),
Filler_MRE = mean(abs(Filler_log-Filler_log.pred)/Filler_log),
Mg_MRE = mean(abs(Mg_log-Mg_log.pred)/Mg_log),
Ni_MRE = mean(abs(Ni_log-Ni_log.pred)/Ni_log))
#for VK2 mixed model cokriging
dt.kv_cs <- data.table(cok_cs_val %>% select(paste0(fkt_new,".pred"))%>% exp,
dt[subset=="validation"] %>% select(fkt_new)%>% exp)
sumry_cs <- dt.kv_cs %>% summarise(.,
Name = "VK2",
Al_MRE = mean(abs(Al_log-Al_log.pred)/Al_log),
Co_MRE = mean(abs(Co_log-Co_log.pred)/Co_log),
Fe_MRE = mean(abs(Fe_log-Fe_log.pred)/Fe_log),
Filler_MRE = mean(abs(Filler_log-Filler_log.pred)/Filler_log),
Mg_MRE = mean(abs(Mg_log-Mg_log.pred)/Mg_log),
Ni_MRE = mean(abs(Ni_log-Ni_log.pred)/Ni_log))
# for VK3 logratio cokriging
dt.alr <- data.table(cok_alr_val[,-c(1,2)] ,
dt[subset=="validation"] %>% select(fkt_new) %>% exp)
sumry_alr <- dt.alr %>% summarise(.,
Name = "VK3",
Al_MRE = mean(abs(Al_log-Al)/Al_log),
Co_MRE = mean(abs(Co_log-Co)/Co_log),
Fe_MRE = mean(abs(Fe_log-Fe)/Fe_log),
Filler_MRE = mean(abs(Filler_log-Filler)/Filler_log),
Mg_MRE = mean(abs(Mg_log-Mg)/Mg_log),
Ni_MRE = mean(abs(Ni_log-Ni)/Ni_log))
#######for cosimulations######
#for VS1 cosimulation with log values
dt.cv_l <- data.table(cosim_val %>% select(contains(".mean")),
dt[subset=="validation"] %>% select(fkt_new) %>% exp)
sumry_cvl <- dt.cv_l %>% summarise(.,
Name = "VS1",
Al_MRE = mean(abs(Al_log-Al_log.mean)/Al_log),
Co_MRE = mean(abs(Co_log-Co_log.mean)/Co_log),
Fe_MRE = mean(abs(Fe_log-Fe_log.mean)/Fe_log),
Filler_MRE = mean(abs(Filler_log-Filler_log.mean)/Filler_log),
Mg_MRE = mean(abs(Mg_log-Mg_log.mean)/Mg_log),
Ni_MRE = mean(abs(Ni_log-Ni_log.mean)/Ni_log))
#for VS2 mixed model cosimulation
dt.cv_cs <- data.table(cos_cs_val %>% select(paste0(fkt_new,".mean")),
dt[subset=="validation"] %>% select(fkt_new) %>% exp)
sumry_cv_cs <- dt.cv_cs %>% summarise(.,
Name = "VS2",
Al_MRE = mean(abs(Al_log-Al_log.mean)/Al_log),
Co_MRE = mean(abs(Co_log-Co_log.mean)/Co_log),
Fe_MRE = mean(abs(Fe_log-Fe_log.mean)/Fe_log),
Filler_MRE = mean(abs(Filler_log-Filler_log.mean)/Filler_log),
Mg_MRE = mean(abs(Mg_log-Mg_log.mean)/Mg_log),
Ni_MRE = mean(abs(Ni_log-Ni_log.mean)/Ni_log))
# for VS3 logratio cosimulation
dt.cos_alr <- data.table(cos_comp_sum_val ,
dt[subset=="validation"] %>% select(fkt_new) %>% exp)
sumry_cos_alr <- dt.cos_alr %>% summarise(.,
Name = "VS3",
Al_MRE = mean(abs(Al_log-Al.mean)/Al_log),
Co_MRE = mean(abs(Co_log-Co.mean)/Co_log),
Fe_MRE = mean(abs(Fe_log-Fe.mean)/Fe_log),
Filler_MRE = mean(abs(Filler_log-Filler.mean)/Filler_log),
Mg_MRE = mean(abs(Mg_log-Mg.mean)/Mg_log),
Ni_MRE = mean(abs(Ni_log-Ni.mean)/Ni_log))
####Summary####
sum_val_MRE <- rbind(sumry_l, sumry_cs, sumry_alr,
sumry_cvl,sumry_cv_cs,sumry_cos_alr)
sum_val_MRE %>% kable(table.attr = "style = \"color: black;\"") %>% kable_styling(bootstrap_options = c("striped", "hover", "condensed"))
| Name | Al_MRE | Co_MRE | Fe_MRE | Filler_MRE | Mg_MRE | Ni_MRE |
|---|---|---|---|---|---|---|
| VK1 | 0.8038649 | 1.032103 | 0.4929078 | 0.1135097 | 1.762340 | 0.6288471 |
| VK2 | 0.8314625 | 1.021722 | 0.5062684 | 0.1159618 | 1.779700 | 0.6405719 |
| VK3 | 0.8615388 | 1.103832 | 0.5315130 | 0.1367375 | 1.847911 | 0.6715783 |
| VS1 | 1.1192446 | 1.872803 | 0.5595885 | 0.1154562 | 2.359084 | 0.6804091 |
| VS2 | 1.2189431 | 2.145350 | 0.5609244 | 0.1175318 | 2.419242 | 0.6784959 |
| VS3 | 3.1697966 | 1.963536 | 0.5921043 | 0.1392523 | 2.168711 | 0.6762911 |
# temp <- list()
# for (i in fkt_new) {
# temp[[i]] <- ggplot(data=dt.kv_l, aes(x = get(paste0(i, ".pred")), y=get(i)))+
# geom_point(size=0.5)+
# geom_abline(intercept = 0)+
# labs(title = i, x="Pred. V.", y= "Real V.")
#
#
# }
#
# do.call("grid.arrange", c(temp, ncol=3))
It can be seen, that the MRE is very high. Ranging from 0.11-0.14 for Filler up to 3.17 for Al for VS3. For the focus variable Ni, the best deviation was achieved by VS3. However, VS3 achieves bad results for Al. Because of this, it will not be used for the final resource estimation. The next constraint for the final estimation is the demand of the original task to assess the expectable variation of the resource estimation. As such a gaussian simulation has to be utilized. Hence, for the final estimation VS1 is utilized, which provides slightly smaller prediction errors than the more complex model VS2. Another reason not to use VS2 are the very long calculation times and frequent freezes that occurred while running VS2 with more than 10 iterations.
The mean deviation gives an overview of the over- or underprediction of certain models. However, it must not be used as a single estimation tool. Since positive and negative errors equalize, it can be misleading and should be used in conjunction with other error estimation tools like the above ones.
#dataframe consisting of predicted and actual validation values
#for VK1 cokriging with log values
dt.kv_l <- data.table(cokriged_val %>% select(contains(".pred")) %>% exp(),
dt[subset=="validation"] %>% select(fkt_new) %>% exp())
sumry_l <- dt.kv_l %>% summarise(.,
Name = "VK1",
Al_MD = mean((Al_log-Al_log.pred)),
Co_MD = mean((Co_log-exp(Co_log.pred))),
Fe_MD = mean((Fe_log-Fe_log.pred)),
Filler_MD = mean((Filler_log-Filler_log.pred)),
Mg_MD = mean((Mg_log-Mg_log.pred)),
Ni_MD = mean((Ni_log-Ni_log.pred)))
#for VK2 mixed model cokriging
dt.kv_cs <- data.table(cok_cs_val %>% select(paste0(fkt_new,".pred")) %>% exp(),
dt[subset=="validation"] %>% select(fkt_new)%>% exp())
sumry_cs <- dt.kv_cs %>% summarise(.,
Name = "VK2",
Al_MD = mean((Al_log-Al_log.pred)),
Co_MD = mean((Co_log-Co_log.pred)),
Fe_MD = mean((Fe_log-Fe_log.pred)),
Filler_MD = mean((Filler_log-Filler_log.pred)),
Mg_MD = mean((Mg_log-Mg_log.pred)),
Ni_MD = mean((Ni_log-Ni_log.pred)))
# for VK3 logratio cokriging
dt.alr <- data.table(cok_alr_val[,-c(1,2)] ,
dt[subset=="validation"] %>% select(fkt_new) %>% exp)
sumry_alr <- dt.alr %>% summarise(.,
Name = "VK3",
Al_MD = mean(((Al_log)-(Al))),
Co_MD = mean(((Co_log)-(Co))),
Fe_MD = mean(((Fe_log)-(Fe))),
Filler_MD = mean(((Filler_log)-(Filler))),
Mg_MD = mean(((Mg_log)-(Mg))),
Ni_MD = mean(((Ni_log)-(Ni))))
#######for cosimulations######
#for VS1 cosimulation with log values
dt.cv_l <- data.table(cosim_val %>% select(contains(".mean")),
dt[subset=="validation"] %>% select(fkt_new) %>% exp())
sumry_cvl <- dt.cv_l %>% summarise(.,
Name = "VS1",
Al_MD = mean((Al_log-(Al_log.mean))),
Co_MD = mean((Co_log-(Co_log.mean))),
Fe_MD = mean((Fe_log-(Fe_log.mean))),
Filler_MD = mean((Filler_log-(Filler_log.mean))),
Mg_MD = mean((Mg_log-(Mg_log.mean))),
Ni_MD = mean((Ni_log-(Ni_log.mean))))
#for VS2 mixed model cosimulation
dt.cv_cs <- data.table(cos_cs_val %>% select(paste0(fkt_new,".mean")),
dt[subset=="validation"] %>% select(fkt_new) %>% exp())
sumry_cv_cs <- dt.cv_cs %>% summarise(.,
Name = "VS2",
Al_MD = mean((Al_log-(Al_log.mean))),
Co_MD = mean((Co_log-(Co_log.mean))),
Fe_MD = mean((Fe_log-(Fe_log.mean))),
Filler_MD = mean((Filler_log-(Filler_log.mean))),
Mg_MD = mean((Mg_log-(Mg_log.mean))),
Ni_MD = mean((Ni_log-(Ni_log.mean))))
# for VS3 logratio cosimulation
dt.cos_alr <- data.table(cos_comp_sum_val ,
dt[subset=="validation"] %>% select(fkt_new) %>% exp)
sumry_cos_alr <- dt.cos_alr %>% summarise(.,
Name = "VS3",
Al_MD = mean(((Al_log)-(Al.mean))),
Co_MD = mean(((Co_log)-(Co.mean))),
Fe_MD = mean(((Fe_log)-(Fe.mean))),
Filler_MD = mean(((Filler_log)-(Filler.mean))),
Mg_MD = mean(((Mg_log)-(Mg.mean))),
Ni_MD = mean(((Ni_log)-(Ni.mean))))
####Summary####
sum_val <- rbind(sumry_l, sumry_cs, sumry_alr,
sumry_cvl,sumry_cv_cs,sumry_cos_alr) %>% kable(table.attr = "style = \"color: black;\"") %>% kable_styling(bootstrap_options = c("striped", "hover", "condensed"))
sum_val
| Name | Al_MD | Co_MD | Fe_MD | Filler_MD | Mg_MD | Ni_MD |
|---|---|---|---|---|---|---|
| VK1 | 0.5092826 | -0.9774758 | 2.0527079 | 0.3578403 | 2.389288 | 0.1378451 |
| VK2 | 0.8875703 | 0.0256881 | 2.8189492 | 0.4690732 | 3.069060 | 0.1606239 |
| VK3 | 0.5904066 | 0.0234014 | 1.3840424 | -4.9421026 | 2.823281 | 0.1209711 |
| VS1 | -0.5088598 | -0.0009743 | 0.7968986 | 0.1361651 | 1.614972 | 0.1067408 |
| VS2 | -0.7425656 | -0.0044831 | 0.8172421 | 0.2158945 | 2.470693 | 0.1238483 |
| VS3 | -5.7674948 | 0.0023425 | 1.3993417 | 1.7786226 | 2.456502 | 0.1306857 |
# temp <- list()
# for (i in fkt_new) {
# temp[[i]] <- ggplot(data=dt.kv_l, aes(x = get(paste0(i, ".pred")), y=get(i)))+
# geom_point(size=0.5)+
# geom_abline(intercept = 0)+
# labs(title = i, x="Pred. V.", y= "Real V.")
#
#
# }
#
# do.call("grid.arrange", c(temp, ncol=3))
The mean deviation of prediction to real value gives information whether the predictions generaelly tend to over- or underpredict. It shows that the kriging methods generally predict lower values except for VK3-Filler where the prediction is in average by ca. 5 weight-% higher. The cosimulations also predict lower values in most cases except for Al and Co. There, higher values are predicted. VS yields to lowest error in this category.
To obtain additional detailed information, the predicted values are plotted against the real values form the prediction dataset. This allows to a qualitative insight into the model quality. This is done also done on the example of the variable under focus: Ni.
i <- "Ni"
a <- ggplot(data=dt.kv_l, aes(x = get(paste0(i, "_log.pred"))%>% exp(), y=get(paste0(i, "_log"))%>% exp()))+
geom_point(size=0.5)+
geom_abline(intercept = 0)+
coord_fixed()+
labs(title = "VK1", x="Pred. V.", y= "Real V.")
b <- ggplot(data=dt.kv_cs, aes(x = get(paste0(i, "_log.pred"))%>% exp(), y=get(paste0(i, "_log"))%>% exp()))+
geom_point(size=0.5)+
geom_abline(intercept = 0)+
coord_fixed()+
labs(title = "VK2", x="Pred. V.", y= "Real V.")
c <- ggplot(data=dt.alr, aes(x = get(paste0(i)), y=get(paste0(i, "_log"))))+
geom_point(size=0.5)+
geom_abline(intercept = 0)+
coord_fixed()+
labs(title = "VK3", x="Pred. V.", y= "Real V.")
d <- ggplot(data=dt.cv_l, aes(x = get(paste0(i, "_log.mean")), y=get(paste0(i, "_log"))%>% exp()))+
geom_point(size=0.5)+
geom_abline(intercept = 0)+
coord_fixed()+
labs(title = "VS1", x="Pred. V.", y= "Real V.")
e <- ggplot(data=dt.cv_cs, aes(x = get(paste0(i, "_log.mean")), y=get(paste0(i, "_log")) %>% exp()))+
geom_point(size=0.5)+
geom_abline(intercept = 0)+
coord_fixed()+
labs(title = "VS2", x="Pred. V.", y= "Real V.")
f <- ggplot(data=dt.cos_alr, aes(x = get(paste0(i, ".mean")), y= get(paste0(i, "_log"))))+
geom_point(size=0.5)+
geom_abline(intercept = 0)+
coord_fixed()+
labs(title = "VS3", x="Pred. V.", y= "Real V.")
grid.arrange(a,b,c,d,e,f, nrow=2)
It can be seen that all models tend to underpredict to an extend. Here, the picture repeats itself. All methods underpredict for Ni. However, the methods VS1 and VS3 show the best fit from a qualitative point of view. Also, it can be seen that a large portion of the underprediction can be attributed to relatively few outliers that show very high results. Additionally, it also can be seen that VS2 tends to overpredict values at low levels. In general, none of the models can really predict scarcely appearing very high values. This can be put into context to two points:
As such, it appears, that the method of splitting the dataset in a chessboard like pattern created very small structures containing high Ni contents that where not really be captured by the variography. However, for the final resource estimation, the complete dataset will be used. This might mediate this effect to some extend.
VS1 is used for the final estimation as it showed the best validation results for Ni from the group of gaussian simulation models and requires much less computation time/runs more reliable. The Lcode and the chemical variables are computed separately. First the chemical components are simulated.
#redraw extends (smae as above - just for dubugging)
extends <-dt[,c("X", "Y")] %>% sapply(range) %>% as.data.frame()
x = seq(from=min(extends$X)-100, to=max(extends$X)+100, by=10)
y = seq(from=min(extends$Y)-100, to=max(extends$Y)+100, by=10)
xy_grid = expand.grid(X=x, Y=y)
#initiate container
{gs_l_f=NULL
for(j in vars_l){
frm = as.formula(paste(j,1, sep="~"))
print(frm)
gs_l_f = gstat(gs_l_f, id=j, formula=frm, locations=~X+Y, data=dt, nmax = 20, omax=6, maxdist = 300)
}
}
## Al_log ~ 1
## Co_log ~ 1
## Fe_log ~ 1
## Filler_log ~ 1
## Mg_log ~ 1
## Ni_log ~ 1
# create variograms
vge_l_f = gstat::variogram(gs_l_f,width=35, cutoff= 300)
#create initial varmodels
vgt_l_m = vgm(psill=0.5, range=120, nugget=1, model="Sph", add.to=vgm(psill=0.8, range=300, model="Gau"))
#fitting
gs_l_f = gstat::fit.lmc(v=vge_l_f, g = gs_l_f, model = vgt_l_m, correct.diagonal = 1.00001)
gs_l_f_pred = predict(gs_l_f, newdata=xy_grid, debug.level = -1, nmax=20, omax = 6,nsim = 30)
## drawing 30 multivariate GLS realisations of beta...
## Linear Model of Coregionalization found. Good.
## [using conditional Gaussian cosimulation]
##
0% done
2% done
4% done
5% done
7% done
9% done
10% done
12% done
13% done
15% done
17% done
18% done
20% done
21% done
23% done
24% done
26% done
27% done
29% done
31% done
32% done
34% done
35% done
37% done
38% done
39% done
41% done
43% done
44% done
46% done
47% done
49% done
50% done
52% done
53% done
54% done
56% done
57% done
59% done
60% done
62% done
63% done
65% done
66% done
68% done
69% done
70% done
72% done
73% done
75% done
76% done
77% done
79% done
80% done
81% done
83% done
84% done
86% done
87% done
88% done
90% done
91% done
93% done
94% done
95% done
97% done
98% done
99% done
100% done
saveRDS(gs_l_f_pred, "prediction_final.RDS")
Subsequently the LCode is simulated similar to the previously used Method IK. The anisotropic variograms for the individual rock types are used, and the Lcodes are simulated sequentially assuming spatial independence in between the variables.
#sequential - as IK kriging
gs_i_FZ <- gstat(id="FZ", formula = (Lcode=="FZ")~1, locations = ~X+Y, data=dt[subset=="training"], nmax = 20, omax=6, maxdist = 200)
gs_i_SA <- gstat(id="SA", formula = (Lcode=="SA")~1, locations = ~X+Y, data=dt[subset=="training"], nmax = 20, omax=6, maxdist = 200)
gs_i_SM <- gstat(id="SM", formula = (Lcode=="SM")~1, locations = ~X+Y, data=dt[subset=="training"], nmax = 20, omax=6, maxdist = 200)
gs_i_UM <- gstat(id="UM", formula = (Lcode=="UM")~1, locations = ~X+Y, data=dt[subset=="training"], nmax = 20, omax=6, maxdist = 200)
Lcodes <- c("FZ", "SA", "SM", "UM")
#initiate the variograms as loop for each LCode
for (i in c("FZ", "SA", "SM", "UM")){
assign(paste0("vg_i_", i),
variogram(get(paste0("gs_i_", i)),
width=35, cutoff=200, alpha=c(0:5)*30 ))
}
#fit FZ
#plot(vg_i_FZ)
vgt_FZ <- vgm(psill=0.1, model="Wav", range=120, nugget=0,anis = c(40,0.4),
add.to = vgm(psill=0.15, range=350, nugget=0.1, model="Gau"))
# plot(vg_i_FZ, model=vgt_FZ)
# fit SA
#plot(vg_i_SA)
vgt_SA <- vgm(psill=0.05, model="Wav", range=50, nugget=0.03,anis = c(0,0.4))
#plot(vg_i_SA, model=vgt_SA)
# fit SM
# plot(vg_i_SM)
vgt_SM <- vgm(psill=0.05, model="Wav", range=80, nugget=0.03,anis = c(0,0.4),
add.to = vgm(psill=0.15, range=350, nugget=0.1, model="Exp"))
#plot(vg_i_SM, model=vgt_SM)
#fit UM
# plot(vg_i_UM)
vgt_UM <- vgm(psill=0.05, model="Exp", range=150, nugget=0.02,anis = c(120,0.3),
add.to = vgm(psill=0.01, range=50, nugget=0, model="Wav",anis = c(30,0.3)))
#update gstats
for (i in Lcodes){
assign(paste0("gs_i_", i),
gstat(id=i,
formula = (Lcode==paste0(i))~1,
locations = ~X+Y,
data=dt,
model = get(paste0("vgt_",i)),
nmax=30)
)
}
#kriging
for (i in Lcodes){
assign(paste0("sim_", i),
predict(get(paste0("gs_i_", i)),
newdata=xy_grid, nsim = 30)
)
}
## drawing 30 GLS realisations of beta...
## [using conditional Gaussian simulation]
## drawing 30 GLS realisations of beta...
## [using conditional Gaussian simulation]
## drawing 30 GLS realisations of beta...
## [using conditional Gaussian simulation]
## drawing 30 GLS realisations of beta...
## [using conditional Gaussian simulation]
i="FZ"
#calculate mean simulation results
for (i in Lcodes){
assign(paste0("sim_", i, "_pred"),
get(paste0("sim_", i)) %>% select(-c(1,2)) %>% apply(., MARGIN = 1, mean) %>% as.data.frame()
)
}
# combine estimations
sim_I_probs <- cbind(sim_FZ_pred, sim_SA_pred, sim_SM_pred, sim_UM_pred)
colnames(sim_I_probs) <- paste0(Lcodes, ".pred")
sim_I_probs$Lcode <- sim_I_probs %>% max.col() %>%
factor(x = .,levels = c(1,2,3,4),labels = Lcodes)
sim_I_probs <- cbind(sim_I_probs, sim_SA[,1:2])
#Plotting the results
a <- ggplot(data = sim_I_probs, aes(x=X, y=Y, fill=sim_I_probs$Lcode))+
geom_raster()+
coord_fixed()+
geom_point(data=dt, inherit.aes = F, aes(x=X, y=Y, fill=Lcode), size=3, shape=21, alpha=0.7)+
labs(fill="Lcode", title = "Lcode")
a
It can be seen, that the small patches for SA and UM do now appear as more “complete” units. This can be attributed to the fact that the sampling density now is roughly doubled.
#
cos_cs_f_pred <- readRDS(file = "prediction_final.RDS")
# calculate mean values for chemicals
for (i in fkt_new) {
cos_cs_f_pred[,paste0(i,".mean")] <- cos_cs_f_pred %>% select(contains(i))%>% exp() %>% apply(.,1, mean) %>% as.vector()
cos_cs_f_pred[,paste0(i,".vc")] <- cos_cs_f_pred %>% select(contains(i))%>% exp() %>% apply(.,1, function (x)(sd(x)/mean(x))) %>% as.vector()
}
#calculate sum LCode (not used right now)
# for (i in Lcodes) {
# cos_cs_f_pred[,(paste0(i,".sum"))] <-
# cos_cs_f_pred %>%
# select(contains(paste0(i, ".sim"))) %>% apply(.,1, median) %>% as.vector()
# }
# cos_cs_f_pred$Lcode.pred <- cos_cs_f_pred %>%
# select(contains(".sum")) %>% max.col() %>% factor(x = .,levels = c(1,2,3,4),labels = Lcodes)
plotly::plot_ly(title="Ni [%]") %>%
add_rasterly_heatmap(cos_cs_f_pred,
x= cos_cs_f_pred$X,
y= cos_cs_f_pred$Y,
z=cos_cs_f_pred$Ni_log.mean,
colorscale='Viridis') %>%
plotly::add_markers(data = dt, x=~X, y= ~Y, color= dt$Lcode,size=1,
marker=list(sizeref=10)) %>%
plotly::layout(title ="Ni [%] and Lcode sampling points",autosize = F )
It can be seen that the model tends to underfit the peak behavior of certain points (can be seen well zooming in). This behavior already was shown during the validation of the dataset.
#preparation for comulative distribution
cos_cs_f_pred %<>% as.data.table()
#melting dt
dt.m <- melt.data.table(cos_cs_f_pred %>% select(contains(".sim")) %>%
select(contains(fkt_new))) %>% as.data.table()
dt.m$value <- dt.m$value %>% exp()
#defining ´max an Q999
Mg_max <- dt.m[variable %like% ("Mg")]$value %>% max() %>% round()
Mg_q999 <- dt.m[variable %like% ("Mg")]$value %>% quantile(.,0.999) %>% round()
In the following, the cumulative grade distribution for the individual elements is shown. The values are truncated at the 0.999-Quantile. This is because with the simulation method chosen, in rare occasions very high values are given for a grid node. For example, the maximum for Mg lies at 951 %. This is clearly not possible but can be attributed to the fact that a noncompositional method is used here. The 0.999-Quantile however then lies at 92 %. The vertical lines in the plots indicate the 0.999-Quantile.
i="Al_log"
temp <- list()
o=0
for (i in fkt_new) {
o=o+1
temp[[i]] <-
ggplot(dt.m[variable %like% (i)], aes(x = value, group = variable)) +
stat_ecdf(pad=F)+
geom_vline(xintercept = dt.m[variable %like% (i)]$value %>% max())+
geom_vline(xintercept = dt.m[variable %like% (i)]$value %>% quantile(.,0.999))+
labs(title=fkt[[o]], x = "content [%]")+
xlim(0,dt.m[variable %like% (i)]$value %>% quantile(.,0.999))
}
do.call("grid.arrange", c(temp, ncol=3))
To calculate the Grade-Cutoff-Plots, a custom function as seen in the code is used:
# function to calculate grade-cutoff-table
GraTon.Tab <- function(use, max.quantile=0.999, nsteps = 100, fromzero=F) {
if (fromzero==T) qm <- 0 # start form 0?
if (fromzero==F) qm <- use %>% min() #or from minimum?
Q <- use %>% quantile(.,max.quantile) # define max use quantile?
steps <- seq(from=qm, to=Q, length.out=nsteps) # number of steps
gt <- data.table(cutoff=steps) #basic table
gt$n <- lapply(steps, function(x)(length(which(use>x )))) %>% unlist() # Number of cells above cutoff
gt$prop <- gt$n/max(gt$n) #proportion of n
gt$mean <- lapply(steps, function(x)(mean(use[use>x] ))) %>% unlist() # mean above cutoff
gt
}
The grade tonnage curves on the x-axis show the increasing cutoff while the the blue line shows the amount of total tiles above the respective cutoff. In the 3-D-case, this could be calculated to tonnage above cutoff. The brown lines show the mean grade that the remaining ore above the given cutoff would then have. These numbers can be used for a quick assessment of a given deposit’s feasibility. Also the results from these graphs can be used to pre-estimate revenue for the deposit depending on the chosen cutoff (As ore content can be recalculated into monetary value). However, for a detailed financial feasibility assessment, more factors like like variable mining costs, processing efficiency, interest, geometrical constraints need to be incorporated. Then for an open pit, an automated feasibility assessment can be done (e.g. Lerch-Crossman or Pseudoflow pit optimization). This can be combined with the gaussian simulation to achieve information about the financial uncertainty of a given project.
#create individual cutoff tables for all iterations and merge them into one table
temp <- list()
CO_all <- list()
coeff <- NULL
i="Al"
for (i in fkt) {
#apply to extract all iterations of given element and calculate single cutoff-table
CO_all[[i]] <- lapply(dt.m[variable %like% (i)]$variable %>% unique(),
function(x) (GraTon.Tab(use = dt.m[variable == (x)]$value,max.quantile = 0.99))) %>%
rbindlist(l = .,idcol = "Nr") #coerce individual tables into one
#coefficient for secondary axis transformation
coeff[paste(i)] <- max(CO_all[[i]]$mean)
CO_all[[i]]$mean <- CO_all[[i]]$mean/coeff[paste(i)]
#formula for secondary axis transformation
frm = as.formula(paste0("~.*",coeff[paste(i)]))
temp[[i]] <- ggplot(CO_all[[i]], aes(x=cutoff, group = Nr)) +
geom_line(aes( y= prop), color="blue3")+
geom_line(aes(y= mean), color = "Orange3")+
scale_y_continuous(name= "Proportion above cut-off",
sec.axis = sec_axis(trans=frm, name="mean grade [%]"))+
labs(title = i)+
theme( axis.text.y.right = element_text(color = "orange3"),
axis.text.y.left = element_text(color = "blue3"))
}
do.call("grid.arrange", c(temp, ncol=3))
With the comuted 30 iterations, it can be seen that the mean grade of Ni follows a relatively tight band ranging from 0.5 % to roughly 2 % mean grade. When at least 50% of the deposit’s tiles shall be mined, a cutoff of 0.4 % can be chosen. In this case the average grade would equal 0.7 %. When a mean grade of at least 1% shall be achieved due to financial restrictions, between 25% and 18% of the deposit would remain for mining. The thickness of the blue curve gives an indicator of the uncertainty of tiles above a certain cutoff.
In this report, multiple geostatistical interpolation and modeling Methods have been applied to investigate their for a resource estimation problem. Three kriging methods and three gaussian simulation techniques where used to estimate element contents of a composition of elements. Generally, all methods showed high prediction errors. This might be attributed to the fact that the dataset comprises of a 2-D slice of a 3-D structure. As such valuable information about the spatial continuity are not present. However, the kriging methods in general achieved slightly less prediction errors.
What additionally stands out is that the more complex methods did not necessarily perform better with the validation data. The omnidirectional logratio compositional modeling performed slightly better than the mixed cokriging model taking into consideration anisotropy. Best however performed the general omnidirectional model not taking the actual rock type into account. This is surprising because the exploratory data analysis clearly pointed towards an significant influence of the rock type to the element composition. It however also illustrates the effect, that every additional variable incorporated within the model generally increases the uncertainty.
For the estimation of resource variability , gaussian simulation methods are necessary. Because of this, one gaussian model was chosen for a final resource estimation. This was VS1 due to its performance advantages. VS1 is gaussian, omnidirectional cosimulation with log-transformed values.
The variability of the resource estimation could be shown with the help of cumulative distribution graphs and grade-tonnage-graphs. depending on the desired/necessary cutoff, variation in the minable resources of e.g. 7% at a cutoff of 0.5% Ni have to be taken into account. Mining 7% Ore more or less is quite a considerable variation over the life of a mine where optimization in mining efficiency of few percent can have great monetary effects. As such, a gaussian simulation is capable of estimating financial risk associated with a given mining operation to a much greater extend than kriging methods that give no range of resource uncertainty. However, kriging methods return the best fit for the given situation. This also could be shown by the lesser prediction errors of those models.
[1] - Talebi, H., Mueller, U., Tolosana-Delgado, R. et al. Geostatistical Simulation of Geochemical Compositions in the Presence of Multiple Geological Units: Application to Mineral Resource Evaluation. Math Geosci 51, 129–153 (2019). https://doi.org/10.1007/s11004-018-9763-9
[2] - Makowski, D.; Ben-Shachar, M.; Patil, I.; Lüdecke, D. (2020): Methods and Algorithms for Correlation Analysis in R. In: JOSS 5 (51), S. 2306. DOI: 10.21105/joss.02306.
[3] - Tolosana-Delgado, R. and Menzel P., Block Course: Practical Geostatistics, TU Bergakademie Freiberg, 2021
[4] - Friendly, M.; Working with categorical data with R and the vcd and vcdExtra packages, York University, Toronto, 2021. https://cran.r-project.org/web/packages/vcdExtra/vignettes/vcd-tutorial.pdf
[5] - Olsen, L.R.; The available metrics in cvms https://cran.r-project.org/web/packages/cvms/vignettes/available_metrics.html#multinomial-metrics
[6] - Journel, A.G., Rossi, M.E. When do we need a trend model in kriging?. Math Geol 21, 715–739 (1989). https://doi.org/10.1007/BF00893318
# used for cosimulation(not used right now)
# a <- ggplot(data = cos_cs_f_pred, aes(x=X, y=Y, fill=(cos_cs_f_pred$Lcode.pred %>% as.character())))+
# geom_raster()+
# coord_fixed()+
# geom_point(data=dt, inherit.aes = F, aes(x=X, y=Y, fill=Lcode %>% as.character()), size=3, shape=21, alpha=0.5)+
# labs(fill="Lcode", title = "Lcode")
#
# a